Nucleic acid and amino acid sequences relating to Candida albicans for diagnostics and therapeutics

ABSTRACT

The invention provides isolated polypeptide and nucleic acid sequences derived from  Candida albicans  that are useful in diagnosis and therapy of pathological conditions; antibodies against the polypeptides; and methods for the production of the polypeptides. The invention also provides methods for the detection, prevention and treatment of pathological conditions resulting from fungal infection.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is converted from U.S. provisional application Ser. No.60/074,725, filed Feb. 13, 1998 and U.S. provisional application Ser.No. 60/096,409 filed Aug. 13, 1998.

FIELD OF THE INVENTION

The invention relates to isolated nucleic acids and polypeptides derivedfrom Candida albicans that are useful as molecular targets fordiagnostics, prophylaxis and treatment of pathological conditions, aswell as materials and methods for the diagnosis, prevention, andamelioration of pathological conditions resulting from fungal infection.

BACKGROUND OF THE INVENTION

Candida albicans is a dimorphic fungus which has both a yeast-likegrowth habit and a filamentous form consisting of both hyphae andpseudohypae. The fungus is a member of the normal surface flora of mostindividuals. Although no sexual state has been described for C.albicans, the genome is diploid in most strains (Whelan, W L et al.(1980) Mol. Gen. Genet. 180: 107-113; Whelan, W L and Magee, P T (1981)J. Bacteriol. 145: 896-903; Poulter, R. (1982) J. Bacteriol. 152:969-975) and rearranges relatively frequently (Rustchenko-Bulgac E P, etal (1990) J. Bacteriol. 172: 1276-1283; Barton, R C and Scherer, S(1994) J. Bacteriol. 176: 756-763). In addition, one non-universaldecoding is known in which a leucine codon (CUG) is translated as aserine (Leuker et al. (1994), Mol. Gen. Genet. 245: 212-217; Santos etal., (1993) EMBO Journal 12:607-616). This creates difficulties in theapplication of the powerful genetic and molecular methods used inSaccharomyces and Schizosaccharomyces.

C. albicans exists as part of the normal microbial flora in humans, butcan produce opportunistic infections ranging from topical infectionssuch as oral thrush to life-threatening disseminated mycoses (Ampel, N M(1996) Emerg. Infect. Dis. 2: 109-116). Candida is a major cause ofnosocomial infections and was found to account for more than 75% of allfungal nosocomial infections reported by NNIS (National NosocomialInfections Surveillance) hospitals from 1980-1990 in which fungi aloneaccounted for 7.9% of all nosocomial infections (Beck-Sagu, C M andJarvis, W R (1993) J. Infect. Dis. 167: 1247-1251). Although the sourceof Candida in infections is frequently traced to endogenous sources onthe patient, it has also been traced to exogenous sources in thehospital environment including contaminated solutions and equipment(Shetertz, R J et al. (1992) J Pediatr. 120: 455-461; Weems, J J et al.(1987) J. Clin. Mcirobiol. 1925: 1029-1032), and health care workers(Hunter, P R et al (1990) J. Med Vet Mycol. 28: 317-325; Burnie, J P(1986) J. Hosp. Infect. 8: 1-4; Doebbeling, B N et al. (1991) J. Clin.Microbiol. 29: 1268-1270). Numerous investigations into the molecularbasis of pathogenicity have been made implicating the hyphal form (Lo, HJ et al. (1997) Cell 90:939-949), surface molecules including adhesins(Fukazawa Y and Kagaya K (1997) J Med Vet Mycol 35:87-99), andATP-binding cassette-containing multi-drug resistance proteins (Prasad,R et al. (1995) Curr. Genet. 27: 320-329).

The antimicrobials currently in use against Candida are generally ofthree types: azoles, such as fluconazole, itraconazole, andclotrimazole; polyenes, such as amphotericin B and nystatin; and5-fluorocytosine. However, invasive infections are treated primarilywith fluconazole, amphotericin B, and 5-fluorocytosine, although thelatter two compounds have significant toxic side effects. Thedevelopment of resistance to fluconazole by C. albicans has been notedby a number of researchers (Redding, S (1994) Clin Infect. Dis. 18:339-346; Sargeorzan, J A (1994) Am. J. Med. 97: 339-346; Revankar, S Get al. (1996) J. Infect. Dis. 174: 821-827; Marr, K A et al. (1997)Clin. Infect. Dis 25: 908-910). Relatively short treatments seem toresult in few if any resistant isolates, but extended treatmentsincluding prophylactic treatments such as are required amongimmunocompromised and AIDS patients, result in the appearance offluconazole-resistant strains (Johnson, E M (1995) J. Antimicrob.Chemother. 35: 103-114). Development of fluconazole-resistance has beenobserved to be associated with the development ofamphotericin-resistance (Vazquez, J A (1996) Antimicrob. AgentsChemother. 40: 2511-2516; Nolte, F S et al. (1997) Antimicrob. AgentsChemother. 41: 196-199; White, T C (1997) ASM News 63: 427-433)consistent with the action of both drugs on ergosterol in the membrane.

The difficulty in diagnosing C. albicans infections, the limitedspectrum of current therapeutic drugs and the development of drugresistant strains of C. albicans provide the rationale for theidentification of targets for more rapid and effective methods ofidentification, prevention, and treatment of candidiasis. Theelucidation of the genome of C. albicans would enhance the understandingof how C. albicans, as well as other fungi, causes invasive disease andhow best to combat fungal infection.

SUMMARY OF THE INVENTION

The present invention fulfills the need for diagnostic tools andtherapeutics by providing fungal-specific compositions and methods fordetecting, treating, and preventing fungal infection, in particular C.albicans infection. They also have use as biocontrol agents for plants.

The present invention encompasses isolated nucleic acids andpolypeptides derived from C. albicans that are useful as reagents fordiagnosis of fungal disease, components of effective antifungalvaccines, and/or as targets for antifungal drugs including anti-C.albicans drugs. They can also be used to detect the presence of C.albicans and other Candida species in a sample; and in screeningcompounds for the ability to interfere with the C. albicans life cycleor to inhibit C. albicans infection.

More specifically, this invention features compositions of nucleic acidscorresponding to entire coding sequences of C. albicans proteins,including surface or secreted proteins or parts thereof, nucleic acidscapable of binding mRNA from C. albicans proteins to block proteintranslation, and methods for producing C. albicans proteins or partsthereof using peptide synthesis and recombinant DNA techniques. Thisinvention also features antibodies and nucleic acids useful as probes todetect C. albicans infection. In addition, vaccine compositions andmethods for the protection or treatment of infection by C. albicans arewithin the scope of this invention.

The nucleotide sequences provided in SEQ ID NO: 1-SEQ ID NO: 14103, afragment thereof, or a nucleotide sequence at least about 99.5%identical to a sequence contained within SEQ ID NO: 1-SEQ ID NO: 14103may be “provided” in a variety of medias to facilitate use thereof. Asused herein, “provided” refers to a manufacture, other than an isolatednucleic acid molecule, which contains a nucleotide sequence of thepresent invention, i.e., the nucleotide sequence provided in SEQ ID NO:1-SEQ ID NO: 14103, a fragment thereof, or a nucleotide sequence atleast about 99.5% identical to a sequence contained within SEQ ID NO:1-SEQ ID NO: 14103. Uses for and methods for providing nucleotidesequences in a variety of media is well known in the art (see e.g., EPOPublication No. EP 0 756 006).

In one application of this embodiment, a nucleotide sequence of thepresent invention can be recorded on computer readable media. As usedherein, “computer readable media” refers to any media which can be readand accessed directly by a computer. Such media include, but are notlimited to: magnetic storage media, such as floppy discs, hard discstorage media, and magnetic tape; optical storage media such as CD-ROM;electrical storage media such as RAM and ROM; and hybrids of thesecategories such as magnetic/optical storage media. A person skilled inthe art can readily appreciate how any of the presently known computerreadable media can be used to create a manufacture comprising computerreadable media having recorded thereon a nucleotide sequence of thepresent invention.

As used herein, “recorded” refers to a process for storing informationon computer readable media. A person skilled in the art can readilyadopt any of the presently known methods for recording information oncomputer readable media to generate manufactures comprising thenucleotide sequence information of the present invention.

A variety of data storage structures are available to a person skilledin the art for creating a computer readable media having recordedthereon a nucleotide sequence of the present invention. The choice ofthe data storage structure will generally be based on the means chosento access the stored information. In addition, a variety of dataprocessor programs and formats can be used to store the nucleotidesequence information of the present invention on computer readablemedia. The sequence information can be represented in a word processingtext file, formatted in commercially-available software such asWordPerfect and Microsoft Word, or represented in the form of an ASCIIfile, stored in a database application, such as DB2, Sybase, Oracle, orthe like. A person skilled in the art can readily adapt any number ofdata processor structuring formats (e.g. text file or database) in orderto obtain computer readable media having recorded thereon the nucleotidesequence information of the present invention.

By providing the nucleotide sequence of SEQ ID NO: 1-SEQ ID NO: 14103, afragment thereof, or a nucleotide sequence at least about 99.5%identical to SEQ ID NO: 1-SEQ ID NO: 14103 in computer readable form, aperson skilled in the art can routinely access the coding sequenceinformation for a variety of purposes. Computer software is publiclyavailable which allows a person skilled in the art to access sequenceinformation provided in a computer readable media. Examples of suchcomputer software include programs of the “Staden Package”, “DNA Star”,“MacVector”, GCG “Wisconsin Package” (Genetics Computer Group, Madison,Wis.) and “NCBI Toolbox” (National Center For BiotechnologyInformation). Suitable programs are described, for example, in Martin J.Bishop, ed., Guide to Human Genome Computing, 2d Edition, AcademicPress, San Diego, Calif. (1998); and Leonard F. Peruski, Jr., and AnneHarwood Peruski, The Internet and the New Biology: Tools for Genomic andMolecular Research, American Society for Microbiology, Washington, D.C.(1997).

Computer algorithms enable the identification of C. albicans openreading frames (ORFs) within SEQ ID NO: 1-SEQ ID NO: 14103 which containhomology to ORFs or proteins from other organisms. Examples of suchsimilarity-search algorithms include the BLAST [Altschul et al., J. Mol.Biol. 215:403-410 (1990)] and Smith-Waterman [Smith and Waterman (1981)Advances in Applied Mathematics, 2:482-489] search algorithms. Suitablesearch algorithms are described, for example, in Martin J. Bishop, ed.,Guide to Human Genome Computing, 2d Edition, Academic Press, San Diego,Calif. (1998); and Leonard F. Peruski, Jr., and Anne Harwood Peruski,The Internet and the New Biology: Tools for Genomic and MolecularResearch, American Society for Microbiology, Washington, D.C. (1997).Such algorithms are utilized on computer systems as exemplified below.The ORFs so identified represent protein encoding fragments within theC. albicans genome and are useful in producing commercially importantproteins such as enzymes used in fermentation reactions and in theproduction of commercially useful metabolites.

The present invention further provides systems, particularlycomputer-based systems, which contain the sequence information describedherein. Such systems are designed to identify commercially importantfragments of the C. albicans genome. As used herein, “a computer-basedsystem” refers to the hardware means, software means, and data storagemeans used to analyze the nucleotide sequence information of the presentinvention. The minimum hardware means of the computer-based systems ofthe present invention comprises a central processing unit (CPU), inputmeans, output means, and data storage means. A person skilled in the artcan readily appreciate that any one of the currently availablecomputer-based systems is suitable for use in the present invention. Thecomputer-based systems of the present invention comprise a data storagemeans having stored therein a nucleotide sequence of the presentinvention and the necessary hardware means and software means forsupporting and implementing a search means. As used herein, “datastorage means” refers to memory which can store nucleotide sequenceinformation of the present invention, or a memory access means which canaccess manufactures having recorded thereon the nucleotide sequenceinformation of the present invention.

As used herein, “search means” refers to one or more programs which areimplemented on the computer-based system to compare a target sequence ortarget structural motif with the sequence information stored within thedata storage means. Search means are used to identify fragments orregions of the C. albicans genome which are similar to, or “match”, aparticular target sequence or target motif. A variety of knownalgorithms are known in the art and have been disclosed publicly, and avariety of commercially available software for conducting homology-basedsimilarity searches are available and can be used in the computer-basedsystems of the present invention. Examples of such software include, butare not limited to, FASTA (GCG Wisconsin Package), Bic_SW (CompugenBioccelerator), BLASTN2, BLASTP2, BLASTX2 (NCBI) and Motifs (GCG).Suitable software programs are described, for example, in Martin J.Bishop, ed., Guide to Human Genome Computing, 2d Edition, AcademicPress, San Diego, Calif. (1998); and Leonard F. Peruski, Jr., and AnneHarwood Peruski, The Internet and the New Biology: Tools for Genomic andMolecular Research, American Society for Microbiology, Washington, D.C.(1997). A person skilled in the art will readily recognize that any oneof the available algorithms or implementing software packages forconducting homology searches can be adapted for use in the presentcomputer-based systems.

As used herein, a “target sequence” can be any DNA or amino acidsequence of six or more nucleotides or two or more amino acids. A personskilled in the art can readily recognize that the longer a targetsequence is, the less likely a target sequence will be present as arandom occurrence in the database. The most preferred sequence length ofa target sequence is from about 10 to 100 amino acids or from about 30to 300 nucleotide residues. However, it is well recognized that manygenes are longer than 500 amino acids, or 1.5 kb in length, and thatcommercially important fragments of the C. albicans genome, such assequence fragments involved in gene expression and protein processing,will often be shorter than 30 nucleotides.

As used herein, “a target structural motif,” or “target motif,” refersto any rationally selected sequence or combination of sequences in whichthe sequence(s) are chosen based on a specific functional domain orthree-dimensional configuration which is formed upon the folding of thetarget polypeptide. There is a variety of target motifs known in theart. Protein target motifs include, but are not limited to, enzymaticactive sites, membrane-spanning regions, and signal sequences. Nucleicacid target motifs include, but are not limited to, promoter sequences,hairpin structures and inducible expression elements (protein bindingsequences).

A variety of structural formats for the input and output means can beused to input and output the information in the computer-based systemsof the present invention. A preferred format for an output means ranksfragments of the C. albicans genome possessing varying degrees ofhomology to the target sequence or target motif. Such presentationprovides a person skilled in the art with a ranking of sequences whichcontain various amounts of the target sequence or target motif andidentifies the degree of homology contained in the identified fragment.

A variety of comparing means can be used to compare a target sequence ortarget motif with the data storage means to identify sequence fragmentsof the C. albicans genome. In the present examples, implementingsoftware which implement the BLASTP2 and bic_SW algorithms (Altschul etal., J Mol. Biol. 215:403-410 (1990); Compugen Biocellerator) was usedto identify open reading frames within the C. albicans genome. A personskilled in the art can readily recognize that any one of the publiclyavailable homology search programs can be used as the search means forthe computer-based systems of the present invention. Suitable programsare described, for example, in Martin J. Bishop, ed., Guide to HumanGenome Computing, 2d Edition, Academic Press, San Diego, Calif. (1998);and Leonard F. Peruski, Jr., and Anne Harwood Peruski, The Internet andthe New Biology: Tools for Genomic and Molecular Research, AmericanSociety for Microbiology, Washington, D.C. (1997).

The invention features C. albicans polypeptides, preferably asubstantially pure preparation of an C. albicans polypeptide, or arecombinant C. albicans polypeptide. In preferred embodiments: thepolypeptide has biological activity; the polypeptide has an amino acidsequence at least about 60%, 70%, 80%, 90%, 95%, 98%, or 99% identicalto an amino acid sequence of the invention contained in the SequenceListing, preferably it has about 65% sequence identity with an aminoacid sequence of the invention contained in the Sequence Listing, andmost preferably it has about 92% to about 99% sequence identity with anamino acid sequence of the invention contained in the Sequence Listing;the polypeptide has an amino acid sequence essentially the same as anamino acid sequence of the invention contained in the Sequence Listing;the polypeptide is at least about 5, 10, 20, 50, 100, or 150 amino acidresidues in length; the polypeptide includes at least about 5,preferably at least about 10, more preferably at least about 20, morepreferably at least about 50, 100, or 150 contiguous amino acid residuesof the invention contained in the Sequence Listing. In yet anotherpreferred embodiment, the amino acid sequence which differs in sequenceidentity by about 7% to about 8% from the C. albicans amino acidsequences of the invention contained in the Sequence Listing is alsoencompassed by the invention.

In preferred embodiments: the C. albicans polypeptide is encoded by anucleic acid of the invention contained in the Sequence Listing, or by anucleic acid having at least about 60%, 70%, 80%, 90%, 95%, 98%, or 99%homology with a nucleic acid of the invention contained in the SequenceListing.

In a preferred embodiment, the subject C. albicans polypeptide differsin amino acid sequence at about 1, 2, 3, 5, 10 or more residues from asequence of the invention contained in the Sequence Listing. Thedifferences, however, are such that the C. albicans polypeptide exhibitsan C. albicans biological activity, e.g., the C. albicans polypeptideretains a biological activity of a naturally occurring C. albicansenzyme.

In preferred embodiments, the polypeptide includes all or a fragment ofan amino acid sequence of the invention contained in the SequenceListing; fused, in reading frame, to additional amino acid residues,preferably to residues encoded by genomic DNA 5′ or 3′ to the genomicDNA which encodes a sequence of the invention contained in the SequenceListing.

In yet other preferred embodiments, the C. albicans polypeptide is arecombinant fusion protein having a first C. albicans polypeptideportion and a second polypeptide portion, e.g., a second polypeptideportion having an amino acid sequence unrelated to C. albicans. Thesecond polypeptide portion can be, e.g., any ofglutathione-5-transferase, a DNA binding domain, or a polymeraseactivating domain. In preferred embodiment the fusion protein can beused in a two-hybrid assay.

Polypeptides of the invention include those which arise as a result ofalternative transcription events, alternative RNA splicing events, andalternative translational and postranslational events.

In a preferred embodiment, the encoded C. albicans polypeptide differs(e.g., by amino acid substitution, addition or deletion of at least oneamino acid residue) in amino acid sequence at about 1, 2, 3, 5, 10 ormore residues, from a sequence of the invention contained in theSequence Listing. The differences, however, are such that: the C.albicans encoded polypeptide exhibits a C. albicans biological activity,e.g., the encoded C. albicans enzyme retains a biological activity of anaturally occurring C. albicans.

In preferred embodiments, the encoded polypeptide includes all or afragment of an amino acid sequence of the invention contained in theSequence Listing; fused, in reading frame, to additional amino acidresidues, preferably to residues encoded by genomic DNA 5′ or 3′ to thegenomic DNA which encodes a sequence of the invention contained in theSequence Listing.

The C. albicans strain from which the nucleotide sequences have beensequenced is strain SC5314, a clinical isolate which was originallyobtained from a patient with disseminated candidiasis.

Included in the invention are: allelic variations; natural mutants;induced mutants; proteins encoded by DNA that hybridize under high orlow stringency conditions to a nucleic acid which encodes a polypeptideof the invention contained in the Sequence Listing (for definitions ofhigh and low stringency see Current Protocols in Molecular Biology, JohnWiley & Sons, New York, 1989, 6.3.1-6.3.6, hereby incorporated byreference); and, polypeptides specifically bound by antisera to C.albicans polypeptides, especially by antisera to an active site orbinding domain of C. albicans polypeptide. The invention also includesfragments, preferably biologically active fragments. These and otherpolypeptides are also referred to herein as C. albicans polypeptideanalogs or variants.

The invention further provides nucleic acids, e.g., RNA or DNA, encodinga polypeptide of the invention. This includes double stranded nucleicacids as well as coding and antisense single strands.

In preferred embodiments, the subject C. albicans nucleic acid willinclude a transcriptional regulatory sequence, e.g. at least one of atranscriptional promoter or transcriptional enhancer sequence, operablylinked to the C. albicans gene sequence, e.g., to render the C. albicansgene sequence suitable for expression in a recombinant host cell.

In yet a further preferred embodiment, the nucleic acid which encodes anC. albicans polypeptide of the invention, hybridizes under stringentconditions to a nucleic acid probe corresponding to at least about 8consecutive nucleotides of the invention contained in the SequenceListing; more preferably to at least about 12 consecutive nucleotides ofthe invention contained in the Sequence Listing; more preferably to atleast about 20 consecutive nucleotides of the invention contained in theSequence Listing; more preferably to at least about 40 consecutivenucleotides of the invention contained in the Sequence Listing.

In another aspect, the invention provides a substantially pure nucleicacid having a nucleotide sequence which encodes an C. albicanspolypeptide. In preferred embodiments: the encoded polypeptide hasbiological activity; the encoded polypeptide has an amino acid sequenceat least about 60%, 70%, 80%, 90%, 95%, 98%, or 99% homologous to anamino acid sequence of the invention contained in the Sequence Listing;the encoded polypeptide has an amino acid sequence essentially the sameas an amino acid sequence of the invention contained in the SequenceListing; the encoded polypeptide is at least about 5, 10, 20, 50, 100,or 150 amino acids in length; the encoded polypeptide comprises at leastabout 5, preferably at least about 10, more preferably at least about20, more preferably at least about 50, 100, or 150 contiguous aminoacids of the invention contained in the Sequence Listing.

In another aspect, the invention encompasses: a vector including anucleic acid which encodes an C. albicans polypeptide or an C. albicanspolypeptide variant as described herein; a host cell transfected withthe vector; and a method of producing a recombinant C. albicanspolypeptide or C. albicans polypeptide variant; including culturing thecell, e.g., in a cell culture medium, and isolating an C. albicans or C.albicans polypeptide variant, e.g., from the cell or from the cellculture medium.

One embodiment of the invention is directed to substantially isolatednucleic acids. Nucleic acids of the invention include sequencescomprising at least about 8 nucleotides in length, more preferably atleast about 12 nucleotides in length, even more preferably at leastabout 15-20 nucleotides in length, that correspond to a subsequence ofany one of SEQ ID NO: 1-SEQ ID NO: 14103 or complements thereof.Alternatively, the nucleic acids comprise sequences contained within anyORF (open reading frame), including a complete protein-coding sequence,of which any of SEQ ID NO: 1-SEQ ID NO: 14103 forms a part. Theinvention encompasses sequence-conservative variants andfunction-conservative variants of these sequences. The nucleic acids maybe DNA, RNA, DNA/RNA duplexes, protein-nucleic acid (PNA), orderivatives thereof.

In another aspect, the invention features, a purified recombinantnucleic acid having at least about 50%, 60%, 70%, 80%, 90%, 95%, 98%, or99% homology with a sequence of the invention contained in the SequenceListing

The invention also encompasses recombinant DNA (including DNA cloningand expression vectors) comprising these C. albicans-derived sequences;host cells comprising such DNA, including fungal, bacterial, yeast,plant, insect, and mammalian host cells; and methods for producingexpression products comprising RNA and polypeptides encoded by the C.albicans sequences. These methods are carried out by incubating a hostcell comprising a C. albicans-derived nucleic acid sequence underconditions in which the sequence is expressed. The host cell may benative or recombinant. The polypeptides can be obtained by (a)harvesting the incubated cells to produce a cell fraction and a mediumfraction; and (b) recovering the C. albicans polypeptide from the cellfraction, the medium fraction, or both. The polypeptides can also bemade by in vitro translation.

In another aspect, the invention features nucleic acids capable ofbinding mRNA of C. albicans. Such nucleic acid is capable of acting asantisense nucleic acid to control the translation of mRNA of C.albicans. A further aspect features a nucleic acid which is capable ofbinding specifically to a C. albicans nucleic acid. These nucleic acidsare also referred to herein as complements and have utility as probesand as capture reagents.

In another aspect, the invention features an expression systemcomprising an open reading frame corresponding to C. albicans nucleicacid. The nucleic acid further comprises a control sequence compatiblewith an intended host. The expression system is useful for makingpolypeptides corresponding to C. albicans nucleic acid.

In another aspect, the invention encompasses: a vector including anucleic acid which encodes a C. albicans polypeptide or a C. albicanspolypeptide variant as described herein; a host cell transfected withthe vector; and a method of producing a recombinant C. albicanspolypeptide or C. albicans polypeptide variant; including culturing thecell, e.g., in a cell culture medium, and isolating the C. albicans orC. albicans polypeptide variant, e.g., from the cell or from the cellculture medium.

In yet another embodiment of the invention encompasses reagents fordetecting fungal infection, including C. albicans infection, whichcomprise at least one C. albicans-derived nucleic acid defined by anyone of SEQ ID NO: 1-SEQ ID NO: 14103, or sequence-conservative orfunction-conservative variants thereof. Alternatively, the diagnosticreagents comprise polypeptide sequences that are contained within anyopen reading frames (ORFs), including complete protein-coding sequences,contained within any of SEQ ID NO: 1-SEQ ID NO: 14103, or polypeptidesequences contained within any of SEQ ID NO: 14104-SEQ ID NO: 28206, orpolypeptides of which any of the above sequences forms a part, orantibodies directed against any of the above peptide sequences orfunction-conservative variants and/or fragments thereof.

The invention further provides antibodies, preferably monoclonalantibodies, which specifically bind to the polypeptides of theinvention. Methods are also provided for producing antibodies in a hostanimal. The methods of the invention comprise immunizing an animal withat least one C. albicans-derived immunogenic component, wherein theimmunogenic component comprises one or more of the polypeptides encodedby any one of SEQ ID NO: 1-SEQ ID NO: 14103 or sequence-conservative orfunction-conservative variants thereof; or polypeptides that arecontained within any ORFs, including complete protein-coding sequences,of which any of SEQ ID NO: 1-SEQ ID NO: 14103 forms a part; orpolypeptide sequences contained within any of SEQ ID NO: 14104-SEQ IDNO: 28206; or polypeptides of which any of SEQ ID NO: 14104-SEQ ID NO:28206 forms a part. Host animals include any warm blooded animal,including without limitation mammals and birds. Such antibodies haveutility as reagents for immunoassays to evaluate the abundance anddistribution of C. albicans-specific antigens.

In yet another aspect, the invention provides diagnostic methods fordetecting C. albicans antigenic components or anti-C. albicansantibodies in a sample. C. albicans antigenic components are detected bya process comprising: (i) contacting a sample suspected to contain afungal antigenic component with a fungal-specific antibody, underconditions in which a stable antigen-antibody complex can form betweenthe antibody and fungal antigenic components in the sample; and (ii)detecting any antigen-antibody complex formed in step (i), whereindetection of an antigen-antibody complex indicates the presence of atleast one fungal antigenic component in the sample. In differentembodiments of this method, the antibodies used are directed against asequence encoded by any of SEQ ID NO: 1-SEQ ID NO: 14103 orsequence-conservative or function-conservative variants thereof, oragainst a polypeptide sequence contained in any of SEQ ID NO: 14104-SEQID NO: 28206 or function-conservative variants thereof.

In yet another aspect, the invention provides a method for detectingantifungal-specific antibodies in a sample, which comprises: (i)contacting a sample suspected to contain antifungal-specific antibodieswith a C. albicans antigenic component, under conditions in which astable antigen-antibody complex can form between the C. albicansantigenic component and antifungal antibodies in the sample; and (ii)detecting any antigen-antibody complex formed in step (i), whereindetection of an antigen-antibody complex indicates the presence ofantifungal antibodies in the sample. In different embodiments of thismethod, the antigenic component is encoded by a sequence contained inany of SEQ ID NO: 1-SEQ ID NO: 14103 or sequence-conservative andfunction-conservative variants thereof, or is a polypeptide sequencecontained in any of SEQ ID NO: 14104-SEQ ID NO: 28206 orfunction-conservative variants thereof.

In another aspect, the invention features a method of generatingvaccines for immunizing an individual against C. albicans. The methodincludes: immunizing a subject with a C. albicans polypeptide, e.g., asurface or secreted polypeptide, or a combination of such peptides oractive portion(s) thereof, and a pharmaceutically acceptable carrier.Such vaccines have therapeutic and prophylactic utilities.

In another aspect, the invention features a method of evaluating acompound, e.g. a polypeptide, e.g., a fragment of a host cellpolypeptide, for the ability to bind a C. albicans polypeptide. Themethod includes: contacting the candidate compound with a C. albicanspolypeptide and determining if the compound binds or otherwise interactswith an C. albicans polypeptide. Compounds which bind C. albicans arecandidates as activators or inhibitors of the fungal life cycle. Theseassays can be performed in vitro or in vivo.

In another aspect, the invention features a method of evaluating acompound, e.g. a polypeptide, e.g., a fragment of a host cellpolypeptide, for the ability to bind an C. albicans nucleic acid, e.g.,DNA or RNA. The method includes: contacting the candidate compound witha C. albicans nucleic acid and determining if the compound binds orotherwise interacts with a C. albicans polypeptide. Compounds which bindC. albicans are candidates as activators or inhibitors of the fungallife cycle. These assays can be performed in vitro or in vivo.

A particularly preferred embodiment of the invention is directed to amethod of screening test compounds for anti-fungal activity, whichmethod comprises: selecting as a target a fungal specific sequence,which sequence is essential to the viability of a fungal species;contacting a test compound with said target sequence; and selectingthose test compounds which bind to said target sequence as potentialanti-fungal candidates. In one embodiment, the target sequence selectedis specific to a single species, or even a single strain, i.e., the C.albicans strain SC5314. In a second embodiment, the target sequence iscommon to at least two species of fungi. In a third embodiment, thetarget sequence is common to a family of fungi. The target sequence maybe a nucleic acid sequence or a polypeptide sequence. Methods employingsequences common to more than one species of microorganism may be usedto screen candidates for broad spectrum anti-fungal activity.

The invention also provides methods for preventing or treating diseasecaused by certain fungi, including C. albicans, which are carried out byadministering to an animal in need of such treatment, in particular awarm-blooded vertebrate, including but not limited to birds and mammals,a compound that specifically inhibits or interferes with the function ofa fungal polypeptide or nucleic acid. In a particularly preferredembodiment, the mammal to be treated is human.

DETAILED DESCRIPTION OF THE INVENTION

The sequences of the present invention include the specific nucleic acidand amino acid sequences set forth in the Sequence Listing that forms apart of the present specification, and which are designated SEQ ID NO:1-SEQ ID NO: 28206. Use of the terms “SEQ ID NO: 1-SEQ ID NO: 14103”,“SEQ ID NO: 14104-SEQ ID NO: 28206” the sequences depicted in Table 2”,and like terms, is intended, for convenience, to refer to eachindividual SEQ ID NO individually, and is not intended to refer to thegenus of these sequences unless such reference would be indicated. Inother words, it is a shorthand for listing all of these sequencesindividually. The invention encompasses each sequence individually, aswell as any combination thereof.

Definitions

“Nucleic acid” or “polynucleotide” as used herein refers to purine- andpyrimidine-containing polymers of any length, either polyribonucleotidesor polydeoxyribonucleotides or mixed polyribo-polydeoxyribo nucleotides.This includes single- and double-stranded molecules, i.e., DNA-DNA,DNA-RNA and RNA-RNA hybrids, as well as “protein nucleic acids” (PNA)formed by conjugating bases to an amino acid backbone. This alsoincludes nucleic acids containing modified bases.

A nucleic acid or polypeptide sequence that is “derived from” adesignated sequence refers to a sequence that corresponds to a region ofthe designated sequence. For nucleic acid sequences, this encompassessequences that are homologous or complementary to the sequence, as wellas “sequence-conservative variants” and “function-conservativevariants.” For polypeptide sequences, this encompasses“function-conservative variants.” Sequence-conservative variants arethose in which a change of one or more nucleotides in a given codonposition results in no alteration in the amino acid encoded at thatposition. Function-conservative variants are those in which a givenamino acid residue in a polypeptide has been changed without alteringthe overall conformation and function of the native polypeptide,including, but not limited to, replacement of an amino acid with onehaving similar physico-chemical properties (such as, for example,acidic, basic, hydrophobic, and the like). “Function-conservative”variants also include any polypeptides that have the ability to elicitantibodies specific to a designated polypeptide.

An “C. albicans-derived” nucleic acid or polypeptide sequence may or maynot be present in other fungal species, and may or may not be present inall C. albicans strains. This term is intended to refer to the sourcefrom which the sequence was originally isolated. Thus, a C.albicans-derived polypeptide, as used herein, may be used, e.g., as atarget to screen for a broad spectrum antifungal agent, to search forhomologous proteins in other species of fungi or in eukaryotic organismssuch as humans, etc.

A purified or isolated polypeptide or a substantially pure preparationof a polypeptide are used interchangeably herein and, as used herein,mean a polypeptide that has been separated from other proteins, lipids,and nucleic acids with which it naturally occurs. Preferably, thepolypeptide is also separated from substances, e.g., antibodies or gelmatrix, e.g., polyacrylamide, which are used to purify it. Preferably,the polypeptide constitutes at least about 10, 20, 50 70, 80 or 95% dryweight of the purified preparation. Preferably, the preparation containssufficient polypeptide to allow protein sequencing, which is preferablyat least about 1, 10, or 100 mg of the polypeptide.

A purified preparation of cells refers to, in the case of plant oranimal cells, an in vitro preparation of cells and not an entire intactplant or animal. In the case of cultured cells or microbial cells, itconsists of a preparation of at least about 10% and more preferably atleast about 50% of the subject cells.

A purified or isolated or a substantially pure nucleic acid, e.g., asubstantially pure DNA, (are terms used interchangeably herein) is anucleic acid which is one or both of the following: not immediatelycontiguous with both of the coding sequences with which it isimmediately contiguous (i.e., one at the 5′ end and one at the 3′ end)in the naturally-occurring genome of the organism from which the nucleicacid is derived; or which is substantially free of a nucleic acid withwhich it occurs in the organism from which the nucleic acid is derived.The term includes, for example, a recombinant DNA which is incorporatedinto a vector, e.g., into an autonomously replicating plasmid or virus,or into the genomic DNA of a prokaryote or eukaryote, or which exists asa separate molecule (e.g., a cDNA or a genomic DNA fragment produced byPCR or restriction endonuclease treatment) independent of other DNAsequences. Substantially pure DNA also includes a recombinant DNA whichis part of a hybrid gene encoding additional C. albicans DNA sequence.

A “contig” as used herein is a nucleic acid representing a continuousstretch of genomic sequence of an organism.

An “open reading frame”, also referred to herein as ORF, is a region ofnucleic acid which encodes a polypeptide. This region may represent aportion of a coding sequence or a total sequence and can be determinedfrom a stop to stop codon or from a start to stop codon.

As used herein, a “coding sequence” is a nucleic acid which istranscribed into messenger RNA and/or translated into a polypeptide whenplaced under the control of appropriate regulatory sequences. Theboundaries of the coding sequence are determined by a translation startcodon at the five prime terminus and a translation stop code at thethree prime terminus. A coding sequence can include but is not limitedto messenger RNA, synthetic DNA, and recombinant nucleic acid sequences.

A “complement” of a nucleic acid as used herein refers to ananti-parallel or antisense sequence that participates in Watson-Crickbase-pairing with the original sequence.

A “gene product” is a protein or structural RNA which is specificallyencoded by a gene.

As used herein, the term “probe” refers to a nucleic acid, peptide orother chemical entity which specifically binds to a molecule ofinterest. Probes are often associated with or capable of associatingwith a label. A label is a chemical moiety capable of detection. Typicallabels comprise dyes, radioisotopes, luminescent and chemiluminescentmoieties, fluorophores, enzymes, precipitating agents, amplificationsequences, and the like. Similarly, a nucleic acid, peptide or otherchemical entity which specifically binds to a molecule of interest andimmobilizes such molecule is referred herein as a “capture ligand”.Capture ligands are typically associated with or capable of associatingwith a support such as nitro-cellulose, glass, nylon membranes, beads,particles and the like. The specificity of hybridization is dependent onconditions such as the base pair composition of the nucleotides, and thetemperature and salt concentration of the reaction. These conditions arereadily discernable to one of ordinary skill in the art using routineexperimentation.

“Homologous” refers to the sequence similarity or sequence identitybetween two polypeptides or between two nucleic acid molecules. When aposition in both of the two compared sequences is occupied by the samebase or amino acid monomer subunit, e.g., if a position in each of twoDNA molecules is occupied by adenine, then the molecules are homologousat that position. The percent of homology between two sequences is afunction of the number of matching or homologous positions shared by thetwo sequences divided by the number of positions compared ×100. Forexample, if 6 of 10 of the positions in two sequences are matched orhomologous then the two sequences are 60% homologous. By way of example,the DNA sequences ATTGCC and TATGGC share 50% homology. Generally, acomparison is made when two sequences are aligned to give maximumhomology.

Nucleic acids are hybridizable to each other when at least one strand ofa nucleic acid can anneal to the other nucleic acid under definedstringency conditions. Stringency of hybridization is determined by: (a)the temperature at which hybridization and/or washing is performed; and(b) the ionic strength and polarity of the hybridization and washingsolutions. Hybridization requires that the two nucleic acids containcomplementary sequences; depending on the stringency of hybridization,however, mismatches may be tolerated. Typically, hybridization of twosequences at high stringency (such as, for example, in a solution of0.5×SSC, at 65° C.) requires that the sequences be essentiallycompletely homologous. Conditions of intermediate stringency (such as,for example, 2×SSC at 65° C.) and low stringency (such as, for example2×SSC at 55° C.), require correspondingly less overall complementaritybetween the hybridizing sequences. (1×SSC is 0.15 M NaCl, 0.015 M Nacitrate).

The terms peptides, proteins, and polypeptides are used interchangeablyherein.

As used herein, the term “surface protein” refers to all surfaceaccessible proteins, e.g. inner and outer membrane proteins, proteinsadhering to the cell wall, and secreted proteins.

A polypeptide has C. albicans biological activity if it has one, two andpreferably more of the following properties: (1) if when expressed inthe course of a C. albicans infection, it can promote, or mediate theattachment of C. albicans to a cell; (2) it has an enzymatic activity,structural or regulatory function characteristic of a C. albicansprotein; (3) or the gene which encodes it can rescue a lethal mutationin a C. albicans gene. A polypeptide has biological activity if it is anantagonist, agonist, or super-agonist of a polypeptide having one of theabove-listed properties.

A biologically active fragment or analog is one having an in vivo or invitro activity which is characteristic of the C. albicans polypeptidesof the invention contained in the Sequence Listing, or of othernaturally occurring C. albicans polypeptides, e.g., one or more of thebiological activities described herein. Especially preferred arefragments which exist in vivo, e.g., fragments which arise from posttranscriptional processing or which arise from translation ofalternatively spliced RNA's. Fragments include those expressed in nativeor endogenous cells as well as those made in expression systems, e.g.,in CHO (Chinese Hamster Ovary) cells. Because peptides such as C.albicans polypeptides often exhibit a range of physiological propertiesand because such properties may be attributable to different portions ofthe molecule, a useful C. albicans fragment or C. albicans analog is onewhich exhibits a biological activity in any biological assay for C.albicans activity. Most preferably the fragment or analog possesses 10%,preferably 40%, more preferably 60%, 70%, 80% or 90% or greater of theactivity of C. albicans, in any in vivo or in vitro assay.

Analogs can differ from naturally occurring C. albicans polypeptides inamino acid sequence or in ways that do not involve sequence, or both.Non-sequence modifications include changes in acetylation, methylation,phosphorylation, carboxylation, or glycosylation. Preferred analogsinclude C. albicans polypeptides (or biologically active fragmentsthereof) whose sequences differ from the wild-type sequence by one ormore conservative amino acid substitutions or by one or morenon-conservative amino acid substitutions, deletions, or insertionswhich do not substantially diminish the biological activity of the C.albicans polypeptide. Conservative substitutions typically include thesubstitution of one amino acid for another with similar characteristics,e.g., substitutions within the following groups: valine, glycine;glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamicacid; asparagine, glutamine; serine, threonine; lysine, arginine; andphenylalanine, tyrosine. Other conservative substitutions can be made inview of the table below. TABLE 1 CONSERVATIVE AMINO ACID REPLACEMENTSFor Amino Acid Code Replace with any of Alanine A D-Ala, Gly, beta-Ala,L-Cys, D-Cys Arginine R D-Arg, Lys, D-Lys, homo-Arg, D-homo-Arg, Met,Ile, D-Met, D-Ile, Orn, D-Orn Asparagine N D-Asn, Asp, D-Asp, Glu,D-Glu, Gln, D-Gln Aspartic Acid D D-Asp, D-Asn, Asn, Glu, D-Glu, Gln,D-Gln Cysteine C D-Cys, S-Me-Cys, Met, D-Met, Thr, D-Thr Glutamine QD-Gln, Asn, D-Asn, Glu, D-Glu, Asp, D-Asp Glutamic Acid E D-Glu, D-Asp,Asp, Asn, D-Asn, Gln, D-Gln Glycine G Ala, D-Ala, Pro, D-Pro, β-Ala, AcpIsoleucine I D-Ile, Val, D-Val, Leu, D-Leu, Met, D-Met Leucine L D-Leu,Val, D-Val, Leu, D-Leu, Met, D-Met Lysine K D-Lys, Arg, D-Arg, homo-Arg,D-homo-Arg, Met, D-Met, Ile, D-Ile, Orn, D-Orn Methionine M D-Met,S-Me-Cys, Ile, D-Ile, Leu, D-Leu, Val, D-Val Phenylalanine F D-Phe, Tyr,D-Thr, L-Dopa, His, D-His, Trp, D-Trp, Trans-3,4, or 5-phenylproline,cis-3,4, or 5-phenyiproline Proline P D-Pro,L-I-thioazolidine-4-carboxylic acid, D-or L-1-oxazolidine-4-carboxylicacid Serine S D-Ser, Thr, D-Thr, allo-Thr, Met, D-Met, Met(O), D-Met(O),L-Cys, D-Cys Threonine T D-Thr, Ser, D-Ser, allo-Thr, Met, D-Met,Met(O), D-Met(O), Val, D-Val Tyrosine Y D-Tyr, Phe, D-Phe, L-Dopa, His,D-His Valine V D-Val, Leu, D-Leu, Ile, D-Ile, Met, D-Met

Other analogs within the invention are those with modifications whichincrease peptide stability; such analogs may contain, for example, oneor more non-peptide bonds (which replace the peptide bonds) in thepeptide sequence. Also included are: analogs that include residues otherthan naturally occurring L-amino acids, e.g., D-amino acids ornon-naturally occurring or synthetic amino acids, e.g., β or γ aminoacids; and cyclic analogs.

As used herein, the term “fragment”, as applied to a C. albicans analog,will ordinarily be at least about 20 residues, more typically at leastabout 40 residues, preferably at least about 60 residues in length.Fragments of C. albicans polypeptides can be generated by methods knownto those skilled in the art. The ability of a candidate fragment toexhibit a biological activity of C. albicans polypeptide can be assessedby methods known to those skilled in the art as described herein. Alsoincluded are C. albicans polypeptides containing residues that are notrequired for biological activity of the peptide or that result fromalternative mRNA splicing or alternative protein processing events.

An “immunogenic component” as used herein is a moiety, such as a C.albicans polypeptide, analog or fragment thereof, that is capable ofeliciting a humoral and/or cellular immune response in a host animal.

An “antigenic component” as used herein is a moiety, such as a C.albicans polypeptide, analog or fragment thereof, that is capable ofbinding to a specific antibody with sufficiently high affinity to form adetectable antigen-antibody complex.

The term “antibody” as used herein is intended to include fragmentsthereof which are specifically reactive with C. albicans polypeptides.

As used herein, the term “cell-specific promoter” means a DNA sequencethat serves as a promoter, i.e., regulates expression of a selected DNAsequence operably linked to the promoter, and which effects expressionof the selected DNA sequence in specific cells of a tissue. The termalso covers so-called “leaky” promoters, which regulate expression of aselected DNA primarily in one tissue, but cause expression in othertissues as well.

Misexpression, as used herein, refers to a non-wild type pattern of geneexpression. It includes: expression at non-wild type levels, i.e., overor under expression; a pattern of expression that differs from wild typein terms of the time or stage at which the gene is expressed, e.g.,increased or decreased expression (as compared with wild type) at apredetermined developmental period or stage; a pattern of expressionthat differs from wild type in terms of increased expression (ascompared with wild type) in a predetermined cell type or tissue type; apattern of expression that differs from wild type in terms of thesplicing size, amino acid sequence, post-translational modification, orbiological activity of the expressed polypeptide; a pattern ofexpression that differs from wild type in terms of the effect of anenvironmental stimulus or extracellular stimulus on expression of thegene, e.g., a pattern of increased or decreased expression (as comparedwith wild type) in the presence of an increase or decrease in thestrength of the stimulus.

As used herein, “host cells” and other such terms denotingmicroorganisms or higher eukaryotic cell lines cultured as unicellularentities refers to cells which can become or have been used asrecipients for a recombinant vector or other transfer DNA, and includethe progeny of the original cell which has been transfected. It isunderstood by individuals skilled in the art that the progeny of asingle parental cell may not necessarily be completely identical ingenomic or total DNA compliment to the original parent, due to accidentor deliberate mutation.

As used herein, the term “control sequence” refers to a nucleic acidhaving a base sequence which is recognized by the host organism toeffect the expression of encoded sequences to which they are ligated.The nature of such control sequences differs depending upon the hostorganism; in prokaryotes, such control sequences generally include apromoter, ribosomal binding site, terminators, and in some casesoperators; in eukaryotes, generally such control sequences includepromoters, terminators and in some instances, enhancers. The termcontrol sequence is intended to include at a minimum, all componentswhose presence is necessary for expression, and may also includeadditional components whose presence is advantageous, for example,leader sequences.

As used herein, the term “operably linked” refers to sequences joined orligated to function in their intended manner. For example, a controlsequence is operably linked to coding sequence by ligation in such a waythat expression of the coding sequence is achieved under conditionscompatible with the control sequence and host cell.

The “metabolism” of a substance, as used herein, means any aspect of theexpression, function, action, or regulation of the substance. Themetabolism of a substance includes modifications, e.g., covalent ornon-covalent modifications of the substance. The metabolism of asubstance includes modifications, e.g., covalent or non-covalentmodification, the substance induces in other substances. The metabolismof a substance also includes changes in the distribution of thesubstance. The metabolism of a substance includes changes the substanceinduces in the distribution of other substances.

A “sample” as used herein refers to a biological sample, such as, forexample, tissue or fluid isloated from an individual (including withoutlimitation plasma, serum, cerebrospinal fluid, lymph, tears, saliva andtissue sections) or from in vitro cell culture constituents, as well assamples from the environment.

Technical and scientific terms used herein have the meanings commonlyunderstood by one of ordinary skill in the art to which the presentinvention pertains, unless otherwise defined. Reference is made hereinto various methodologies known to those of skill in the art.Publications and other materials setting forth such known methodologiesto which reference is made are incorporated herein by reference in theirentireties as though set forth in full. The practice of the inventionwill employ, unless otherwise indicated, conventional techniques ofchemistry, molecular biology, microbiology, recombinant DNA, andimmunology, which are within the skill of the art. Such techniques areexplained fully in the literature. See e.g., Sambrook, Fritsch, andManiatis, Molecular Cloning; Laboratory Manual 2nd ed. (1989); DNACloning, Volumes I and II (D. N Glover ed. 1985); OligonucleotideSynthesis (M. J. Gait ed, 1984); Nucleic Acid Hybridization (B. D. Hames& S. J. Higgins eds. 1984); the series, Methods in Enzymology (AcademicPress, Inc.), particularly Vol. 154 and Vol. 155 (Wu and Grossman,eds.); PCR-A Practical Approach (McPherson, Quirke, and Taylor, eds.,1991); Immunology, 2d Edition, 1989, Roitt et al., C. V. Mosby Company,and New York; Advanced Immunology, 2d Edition, 1991, Male et al., GrowerMedical Publishing, New York.; DNA Cloning: A Practical Approach,Volumes I and II, 1985 (D. N. Glover ed.); Oligonucleotide Synthesis,1984, (M. L. Gait ed); Transcription and Translation, 1984 (Hames andHiggins eds.); Animal Cell Culture, 1986 (R. I. Freshney ed.);Immobilized Cells and Enzymes, 1986 (IRL Press); Perbal, 1984, APractical Guide to Molecular Cloning; Gene Transfer Vectors forMammalian Cells, 1987 (J. H. Miller and M. P. Calos eds., Cold SpringHarbor Laboratory); Martin J. Bishop, ed., Guide to Human GenomeComputing, 2d Edition, Academic Press, San Diego, Calif. (1998); andLeonard F. Peruski, Jr., and Anne Harwood Peruski, The Internet and theNew Biology: Tools for Genomic and Molecular Research, American Societyfor Microbiology, Washington, D.C. (1997).

Any suitable materials and/or methods known to those of skill can beutilized in carrying out the present invention: however preferredmaterials and/or methods are described. Materials, reagents and the liketo which reference is made in the following description and examples areobtainable from commercial sources, unless otherwise noted.

C. albicans Genomic Sequence

This invention provides nucleotide sequences of the genome of C.albicans, strain SC5314, which thus comprises a DNA sequence library ofC. albicans genomic DNA. The detailed description that follows providesnucleotide sequences of C. albicans, and also describes how thesequences were obtained and how ORFs and protein-coding sequences can beidentified. Also described are methods of using the disclosed C.albicans sequences in methods including diagnostic and therapeuticapplications. Furthermore, the library can be used as a database foridentification and comparison of medically important sequences in thisand other strains of C. albicans.

To determine the genomic sequence of C. albicans, DNA from strain SC5314of C. albicans was isolated after Zymolyase digestion, sodium dodecylsulfate lysis, potassium acetate precipitation, phenol:chloroformextraction and ethanol precipitation (Soll, D. R., T. Srikantha and S.R. Lockhart: Characterizing Developmentally Regulated Genes in C.albicans, In Microbial Genome Methods, K. W. Adolph, editor. CRC Press.New York. p 17-37.). DNA was sheared hydrodynamically using an HPLC(Oefner, et. al., 1996) to an insert size of 2000-3000 bp. After sizefractionation by gel electrophoresis the fragments were blunt-ended,ligated to adapter oligonucleotides and cloned into the pGTC (Thomann)vector to construct a “shotgun” subclone library.

DNA sequencing was achieved using established ABI sequencing methods onABI377 automated DNA sequencers. The cloning and sequencing proceduresare described in more detail in the Exemplification.

Individual sequence reads were assembled using PHRAP (P. Green,Abstracts of DOE Human Genome Program Contractor-Grantee Workshop V,January 1996, p. 157). The average contig length was about 3-4 kb.

All subsequent steps were based on sequencing by ABI377 automated DNAsequencing methods. The cloning and sequencing procedures are describedin more detail in the Exemplification.

A variety of approaches is used to order the contigs so as to obtain acontinuous sequence representing the entire C. albicans genome.Synthetic oligonucleotides are designed that are complementary tosequences at the end of each contig. These oligonucleotides may behybridized to libaries of C. albicans genomic DNA in, for example,lambda phage vectors or plasmid vectors to identify clones that containsequences corresponding to the junctional regions between individualcontigs. Such clones are then used to isolate template DNA and the sameoligonucleotides are used as primers in polymerase chain reaction (PCR)to amplify junctional fragments, the nucleotide sequence of which isthen determined.

The C. albicans sequences were analyzed for the presence of open readingframes (ORFs) comprising at least about 180 nucleotides. As a result ofthe analysis of ORFs based on stop-to-stop codon reads, it should beunderstood that these ORFs may not correspond to the ORF of anaturally-occurring C. albicans polypeptide. These ORFs may containstart codons which indicate the initiation of protein synthesis of anaturally-occurring C. albicans polypeptide. Such start codons withinthe ORFs provided herein can be identified by those of ordinary skill inthe relevant art, and the resulting ORF and the encoded C. albicanspolypeptide is within the scope of this invention. For example, withinthe ORFs a codon such as AUG or GUG (encoding methionine or valine)which is part of the initiation signal for protein synthesis can beidentified and the portion of an ORF to corresponding to anaturally-occurring C. albicans polypeptide can be recognized. Thepredicted coding regions were defined by evaluating the coding potentialof such sequences with the program GENEMARK□ (Borodovsky and McIninch,1993, Comp. 17:123).

Each predicted ORF amino acid sequence was compared with all sequencesfound in current GENBANK, SWISS-PROT, and PIR databases using the BLASTalgorithm. BLAST identifies local alignments occurring by chance betweenthe ORF sequence and the sequence in the databank (Altschal et al.,1990, L Mol. Biol. 215:403-410). Homologous ORFs (probabilities lessthan 10⁻⁵ by chance) and ORF's that are probably non-homologous(probabilities greater than 10⁻⁵ by chance) but have good codon usagewere identified. Both homologous, sequences and non-homologous sequenceswith good codon usage, are likely to encode proteins and are encompassedby the invention.

It is to be understood that non-protein-coding sequences containedwithin SEQ ID NO: 1-SEQ ID NO: 14103 are also within the scope of theinvention. Such sequences include, without limitation, sequencesimportant for replication, recombination, transcription and translation.Non-limiting examples include promoters and regulatory binding sitesinvolved in regulation of gene expression, and 5′- and 3′-untranslatedsequences (e.g., ribosome-binding sites) that form part of mRNAmolecules.

Preferred sequences are those that are useful in diagnostic and/ortherapeutic applications. Diagnostic applications include withoutlimitation nucleic-acid-based and antibody-based methods for detectingC. albicans infection. Therapeutic applications include withoutlimitation vaccines, passive immunotherapy, and drug treatments directedagainst gene products that are essential for growth and/or replication.In a particularly preferred aspect of the invention, the nucleic acidsencode protein-coding sequences which share homology to other fungalsequences, lack homology to all eukaryotic sequences, and which areessential to the viability of fungi. Such sequences comprise a libraryof valuable target sequences for drug discovery, in particular, targetswhich may be used to identify broad spectrum antifungal agents.

C. albicans Nucleic Acids

The present invention provides a library of C. albicans-derived nucleicacid sequences. The libraries provide probes, primers, and markers whichcan be used as markers in epidemiological studies. The present inventionalso provides a library of C. albicans-derived nucleic acid sequenceswhich comprise or encode targets for therapeutic drugs.

The nucleic acids of this invention are obtained directly from the DNAof the above referenced C. albicans strain by using the polymerase chainreaction (PCR). See “PCR, A Practical Approach” (McPherson, Quirke, andTaylor, eds., IRL Press, Oxford, UK, 1991) for details about the PCR.High fidelity PCR can be used to ensure a faithful DNA copy prior toexpression. In addition, the authenticity of amplified products can beverified by conventional sequencing methods. Clones carrying the desiredsequences described in this invention may also be obtained by screeningthe libraries by means of the PCR or by hybridization of syntheticoligonucleotide probes to filter lifts of the library colonies orplaques as known in the art (see, e.g., Sambrook et al., MolecularCloning, A Laboratory Manual 2nd edition, 1989, Cold Spring HarborPress, NY).

It is also possible to obtain nucleic acids encoding C. albicanspolypeptides from a cDNA library in accordance with protocols hereindescribed. A cDNA encoding a C. albicans polypeptide can be obtained byisolating total mRNA from an appropriate strain. Double stranded cDNAscan then be prepared from the total mRNA. Subsequently, the cDNAs can beinserted into a suitable plasmid or viral (e.g., bacteriophage) vectorusing any one of a number of known techniques. Genes encoding C.albicans polypeptides can also be cloned using established polymerasechain reaction techniques in accordance with the nucleotide sequenceinformation provided by the invention. The nucleic acids of theinvention can be DNA or RNA. Preferred nucleic acids of the inventionare contained in the Sequence Listing.

The nucleic acids of the invention can also be chemically synthesizedusing standard techniques. Various methods of chemically synthesizingpolydeoxynucleotides are known, including solid-phase synthesis which,like peptide synthesis, has been fully automated in commerciallyavailable DNA synthesizers (See e.g., Itakura et al. U.S. Pat. No.4,598,049; Caruthers et al. U.S. Pat. No. 4,458,066; and Itakura U.S.Pat. Nos. 4,401,796 and 4,373,071, incorporated by reference herein).

In another example, DNA can be chemically synthesized using, e.g., thephosphoramidite solid support method of Matteucci et al., 1981, J. Am.Chem. Soc. 103:3185, the method of Yoo et al., 1989, J. Biol. Chem.764:17078, or other well known methods. This can be done by sequentiallylinking a series of oligonucleotide cassettes comprising pairs ofsynthetic oligonucleotides, as described below.

Nucleic acids isolated or synthesized in accordance with features of thepresent invention are useful, by way of example, without limitation, asprobes, primers, capture ligands, antisense genes and for developingexpression systems for the synthesis of proteins and peptidescorresponding to such sequences. As probes, primers, capture ligands andantisense agents, the nucleic acid normally consists of all or part(approximately twenty or more nucleotides for specificity as well as theability to form stable hybridization products) of the nucleic acids ofthe invention contained in the Sequence Listing. These uses aredescribed in further detail below.

Probes

A nucleic acid isolated or synthesized in accordance with the sequenceof the invention contained in the Sequence Listing can be used as aprobe to specifically detect C. albicans. With the sequence informationset forth in the present application, sequences of about twenty or morenucleotides are identified which provide the desired inclusivity andexclusivity with respect to C. albicans and extraneous nucleic acidslikely to be encountered during hybridization conditions. Morepreferably, the sequence will comprise at least about twenty to thirtynucleotides to convey stability to the hybridization product formedbetween the probe and the intended target molecules.

Sequences larger than about 1000 nucleotides in length are difficult tosynthesize but can be generated by recombinant DNA techniques.Individuals skilled in the art will readily recognize that the nucleicacids, for use as probes, can be provided with a label to facilitatedetection of a hybridization product.

Nucleic acid isolated and synthesized in accordance with the sequence ofthe invention contained in the Sequence Listing can also be useful asprobes to detect homologous regions (especially homologous genes) ofother Candida species using appropriate stringency hybridizationconditions as described herein.

Capture Ligand

For use as a capture ligand, the nucleic acid selected in the mannerdescribed above with respect to probes, can be readily associated with asupport. The manner in which nucleic acid is associated with supports iswell known. Nucleic acid having twenty or more nucleotides in a sequenceof the invention contained in the Sequence Listing have utility toseparate C. albicans nucleic acid from one strain from the nucleic acidof other another strain as well as from other organisms. Nucleic acidhaving twenty or more nucleotides in a sequence of the inventioncontained in the Sequence Listing can also have utility to separateother Candida species from each other and from other organisms.Preferably, the sequence will comprise at least about twenty nucleotidesto convey stability to the hybridization product formed between theprobe and the intended target molecules. Sequences larger than 1000nucleotides in length are difficult to synthesize but can be generatedby recombinant DNA techniques.

Primers

Nucleic acid isolated or synthesized in accordance with the sequencesdescribed herein have utility as primers for the amplification of C.albicans nucleic acid. These nucleic acids may also have utility asprimers for the amplification of nucleic acids in other Candida species.With respect to polymerase chain reaction (PCR) techniques, nucleic acidsequences of ≧about 10-15 nucleotides of the invention contained in theSequence Listing have utility in conjunction with suitable enzymes andreagents to create copies of C. albicans nucleic acid. More preferably,the sequence will comprise at least about twenty or more nucleotides toconvey stability to the hybridization product formed between the primerand the intended target molecules. Binding conditions of primers greaterthan about 100 nucleotides are often more difficult to control to obtainspecificity. High fidelity PCR can be used to ensure a faithful DNA copyprior to expression. In addition, amplified products can be checked byconventional sequencing methods.

The copies can be used in diagnostic assays to detect specificsequences, including genes from C. albicans and/or other Candidaspecies. The copies can also be incorporated into cloning and expressionvectors to generate polypeptides corresponding to the nucleic acidsynthesized by PCR, as is described in greater detail herein.

The nucleic acids of the present invention find use as templates for therecombinant production of C. albicans-derived peptides or polypeptides.

Antisense

Nucleic acid or nucleic acid-hybridizing derivatives isolated orsynthesized in accordance with the sequences described herein haveutility as antisense agents to prevent the expression of C. albicansgenes. These sequences also have utility as antisense agents to preventexpression of genes of other Candida species.

In one embodiment, nucleic acid or derivatives corresponding to C.albicans nucleic acids is loaded into a suitable carrier such as aliposome or bacteriophage for introduction into fungal cells. Forexample, a nucleic acid having about twenty or more nucleotides iscapable of binding to bacteria nucleic acid or bacteria messenger RNA.Preferably, the antisense nucleic acid is comprised of at least about 20or more nucleotides to provide necessary stability of a hybridizationproduct of non-naturally occurring nucleic acid and fungal nucleic acidand/or fungal messenger RNA. Nucleic acid having a sequence greater thanabout 1000 nucleotides in length is difficult to synthesize but can begenerated by recombinant DNA techniques. Methods for loading antisensenucleic acid in liposomes are known in the art as exemplified, forexample, in U.S. Pat. No. 4,241,046 issued Dec. 23, 1980 toPapahadjopoulos et al.

The present invention encompasses isolated polypeptides and nucleicacids derived from C. albicans that are useful as reagents for diagnosisof fungal infection, components of effective anti-fungal vaccines,and/or as targets for anti-fungal drugs, including anti-C. albicansdrugs.

Expression of C. albicans Nucleic Acids

Table 2, which is appended herewith and which forms part of the presentspecification, provides a list of open reading frames (ORFs) in bothstrands and a putative identification of the particular function of apolypeptide which is encoded by each ORF, based on the homology match(determined by the BLAST algorithm) of the predicted polypeptide withknown proteins encoded by ORFs in other organisms. An ORF is a region ofnucleic acid which encodes a polypeptide. This region may represent aportion of a coding sequence or a total sequence and was determined fromstop to stop codons. The first column contains a designation for thecontig from which each ORF was identified (numbered arbitrarily). Eachcontig represents a continuous stretch of the genomic sequence of theorganism. The second column lists the ORF designation. The third andfourth columns list the SEQ ID numbers for the nucleic acid and aminoacid sequences corresponding to each ORF, respectively. The fifth andsixth columns list the length of the nucleic acid ORF and the length ofthe amino acid ORF, respectively. The nucleotide sequence correspondingto each ORF begins at the first nucleotide immediately following a stopcodon and ends at the nucleotide immediately preceding the nextdownstream stop codon in the same reading frame. It will be recognizedby one skilled in the art that the natural translation initiation siteswill correspond to ATG, GTG, or TTG codons located within the ORFs. Thenatural initiation sites depend not only on the sequence of a startcodon but also on the context of the DNA sequence adjacent to the startcodon. Usually, a recognizable ribosome binding site is found within 20nucleotides upstream from the initiation codon. In some cases wheregenes are translationally coupled and coordinately expressed together in“operons,” ribosome binding sites are not present, but the initiationcodon of a downstream gene may occur very close to, or overlap, the stopcodon of the an upstream gene in the same operon. The correct startcodons can be generally identified without undue experimentation becauseonly a few codons need be tested. It is recognized that thetranslational machinery in bacteria initiates all polypeptide chainswith the amino acid methionine, regardless of the sequence of the startcodon. In some cases, polypeptides are post-translationally modified,resulting in an N-terminal amino acid other than methionine in vivo. Theseventh column provides, where available, either a public databaseaccession number or our own sequence name. The eighth and ninth columnsprovide metrics for assessing the likelihood of the homology match(determined by the BLASTP2 algorithm), as is known in the art, to thegenes indicated in the eleventh column when the designated ORF wascompared against a non-redundant comprehensive protein database.Specifically, the eighth column represents the “Blast Score” for thematch (a higher score is a better match), and the ninth columnrepresents the “P-value” for the match (the probability that such amatch can have occurred by chance; the lower the value, the more likelythe match is valid). If a BLASTP2 score of less than 46 was obtained, novalue is reported in the table the “P-value.” Column ten provides thename of the organism that was identified as having the closest homologymatch. The eleventh column provides, where available, the Swissprotaccession number (SP), (SP), the locus name (LN), the Organism (OR),Source of variant (SR), E.C. number (EC), the gene name (GN), theproduct name (PN), the Function Description (FN), Left End (LE), RightEnd (RE), Coding Direction (DI), and the description (DE) or notes (NT)for each ORF. Information that is not preceded by a code designation inthe eleventh column represents a description of the ORF. Thisinformation allows one of ordinary skill in the art to determine apotential use for each identified coding sequence and, as a result,allows to use the polypeptides of the present invention for commercialand industrial purposes.

Using the information provided in SEQ ID NO: 1-SEQ ID NO: 14103, SEQ IDNO: 14104-SEQ ID NO: 28206 and in Table 2 together with routine cloningand sequencing methods, one of ordinary skill in the art will be able toclone and sequence all the nucleic acid fragments of interest includingopen reading frames (ORFs) encoding a large variety proteins of C.albicans.

Nucleic acid isolated or synthesized in accordance with the sequencesdescribed herein have utility to generate polypeptides. The nucleic acidof the invention exemplified in SEQ ID NO: 1-SEQ ID NO: 14103 and inTable 2 or fragments of said nucleic acid encoding active portions of C.albicans polypeptides can be cloned into suitable vectors or used toisolate nucleic acid. The isolated nucleic acid is combined withsuitable DNA linkers and cloned into a suitable vector.

The function of a specific gene or operon can be ascertained byexpression in a fungal strain under conditions where the activity of thegene product(s) specified by the gene or operon in question can bespecifically measured. Alternatively, a gene product may be produced inlarge quantities in an expressing strain for use as an antigen, anindustrial reagent, for structural studies, etc. This expression can beaccomplished in a mutant strain which lacks the activity of the gene tobe tested, or in a strain that does not produce the same geneproduct(s). This includes, but is not limited to, Eucaryotic speciessuch as the yeast Saccharomyces cerevisiae or Candida putida,Methanobacterium strains or other Archaea, and Eubacteria such as E.coli, B. Subtilis, S. Aureus, S. Pneumonia or Pseudomonas putida. Insome cases the expression host will utilize the natural C. albicanspromoter whereas in others, it will be necessary to drive the gene witha promoter sequence derived from the expressing organism (e.g., an E.coli beta-galactosidase promoter for expression in E. coli).

To express a gene product using the natural C. albicans promoter, aprocedure such as the following can be used. A restriction fragmentcontaining the gene of interest, together with its associated naturalpromoter element and regulatory sequences (identified using the DNAsequence data) is cloned into an appropriate recombinant plasmidcontaining an origin of replication that functions in the host organismand an appropriate selectable marker. This can be accomplished by anumber of procedures known to those skilled in the art. It is mostpreferably done by cutting the plasmid and the fragment to be clonedwith the same restriction enzyme to produce compatible ends that can beligated to join the two pieces together. The recombinant plasmid isintroduced into the host organism by, for example, electroporation andcells containing the recombinant plasmid are identified by selection forthe marker on the plasmid. Expression of the desired gene product isdetected using an assay specific for that gene product.

In the case of a gene that requires a different promoter, the body ofthe gene (coding sequence) is specifically excised and cloned into anappropriate expression plasmid. This subcloning can be done by severalmethods, but is most easily accomplished by PCR amplification of aspecific fragment and ligation into an expression plasmid after treatingthe PCR product with a restriction enzyme or exonuclease to createsuitable ends for cloning.

A suitable host cell for expression of a gene can be any procaryotic oreucaryotic cell. Suitable methods for transforming host cells can befound in Sambrook et al. (Molecular Cloning: A Laboratory Manual, 2ndEdition, Cold Spring Harbor Laboratory press (1989)), and otherlaboratory textbooks.

For example, a host cell transfected with a nucleic acid vectordirecting expression of a nucleotide sequence encoding a C. albicanspolypeptide can be cultured under appropriate conditions to allowexpression of the polypeptide to occur. Suitable media for cell cultureare well known in the art. Polypeptides of the invention can be isolatedfrom cell culture medium, host cells, or both using techniques known inthe art for purifying proteins including ion-exchange chromatography,gel filtration chromatography, ultrafiltration, electrophoresis, andimmunoaffinity purification with antibodies specific for suchpolypeptides. Additionally, in many situations, polypeptides can beproduced by chemical cleavage of a native protein (e.g., trypticdigestion) and the cleavage products can then be purified by standardtechniques.

In the case of membrane bound proteins, these can be isolated from ahost cell by contacting a membrane-associated protein fraction with adetergent forming a solubilized complex, where the membrane-associatedprotein is no longer entirely embedded in the membrane fraction and issolubilized at least to an extent which allows it to bechromatographically isolated from the membrane fraction. Chromatographictechniques which can be used in the final purification step are known inthe art and include hydrophobic interaction, lectin affinity, ionexchange, dye affinity and immunoaffinity.

One strategy to maximize recombinant C. albicans peptide expression inE. coli is to express the protein in a host bacteria with an impairedcapacity to proteolytically cleave the recombinant protein (Gottesman,S., Gene Expression Technology: Methods in Enzymology 185, AcademicPress, San Diego, Calif. (1990) 119-128). Another strategy would be toalter the nucleic acid encoding a C. albicans peptide to be insertedinto an expression vector so that the individual codons for each aminoacid would be those preferentially utilized in highly expressed E. coliproteins (Wada et al., (1992) Nuc. Acids Res. 20:2111-2118). Suchalteration of nucleic acids of the invention can be carried out bystandard DNA synthesis techniques.

The nucleic acids of the invention can also be chemically synthesizedusing standard techniques. Various methods of chemically synthesizingpolydeoxynucleotides are known, including solid-phase synthesis which,like peptide synthesis, has been fully automated in commerciallyavailable DNA synthesizers (See, e.g., Itakura et al. U.S. Pat. No.4,598,049; Caruthers et al. U.S. Pat. No. 4,458,066; and Itakura U.S.Pat. Nos. 4,401,796 and 4,373,071, incorporated by reference herein).The present invention provides a library of C. albicans-derived nucleicacid sequences. The libraries provide probes, primers, and markers whichcan be used as markers in epidemiological studies. The present inventionalso provides a library of C. albicans-derived nucleic acid sequenceswhich comprise or encode targets for therapeutic drugs.

Nucleic acids comprising any of the sequences disclosed herein orsub-sequences thereof can be prepared by standard methods using thenucleic acid sequence information provided in SEQ ID NO: 1-SEQ ID NO:14103. For example, DNA can be chemically synthesized using, e.g., thephosphoramidite solid support method of Matteucci et al., 1981, J. Am.Chem. Soc. 103:3185, the method of Yoo et al., 1989, J. Biol. Chem.764:17078, or other well known methods. This can be done by sequentiallylinking a series of oligonucleotide cassettes comprising pairs ofsynthetic oligonucleotides, as described below.

Of course, due to the degeneracy of the genetic code, many differentnucleotide sequences can encode polypeptides having the amino acidsequences defined by SEQ ID NO: 14104-SEQ ID NO: 28206 or sub-sequencesthereof. The codons can be selected for optimal expression inprokaryotic or eukaryotic systems. Such degenerate variants are alsoencompassed by this invention.

Insertion of nucleic acids (typically DNAs) encoding the polypeptides ofthe invention into a vector is easily accomplished when the termini ofboth the DNAs and the vector comprise compatible restriction sites. Ifthis cannot be done, it may be necessary to modify the termini of theDNAs and/or vector by digesting back single-stranded DNA overhangsgenerated by restriction endonuclease cleavage to produce blunt ends, orto achieve the same result by filling in the single-stranded terminiwith an appropriate DNA polymerase.

Alternatively, any site desired may be produced, e.g., by ligatingnucleotide sequences (linkers) onto the termini. Such linkers maycomprise specific oligonucleotide sequences that define desiredrestriction sites. Restriction sites can also be generated by the use ofthe polymerase chain reaction (PCR). See, e.g., Saiki et al., 1988,Science 239:48. The cleaved vector and the DNA fragments may also bemodified if required by homopolymeric tailing.

The nucleic acids of the invention may be isolated directly from cells.Alternatively, the polymerase chain reaction (PCR) method can be used toproduce the nucleic acids of the invention, using either chemicallysynthesized strands or genomic material as templates. Primers used forPCR can be synthesized using the sequence information provided hereinand can further be designed to introduce appropriate new restrictionsites, if desirable, to facilitate incorporation into a given vector forrecombinant expression.

The nucleic acids of the present invention may be flanked by natural C.albicans regulatory sequences, or may be associated with heterologoussequences, including promoters, enhancers, response elements, signalsequences, polyadenylation sequences, introns, 5′- and 3′-noncodingregions, and the like. The nucleic acids may also be modified by manymeans known in the art. Non-limiting examples of such modificationsinclude methylation, “caps”, substitution of one or more of thenaturally occurring nucleotides with an analog, internucleotidemodifications such as, for example, those with uncharged linkages (e.g.,methyl phosphonates, phosphotriesters, phosphoroamidates, carbamates,etc.) and with charged linkages (e.g., phosphorothioates,phosphorodithioates, etc.). Nucleic acids may contain one or moreadditional covalently linked moieties, such as, for example, proteins(e.g., nucleases, toxins, antibodies, signal peptides, poly-L-lysine,etc.), intercalators (e.g., acridine, psoralen, etc.), chelators (e.g.,metals, radioactive metals, iron, oxidative metals, etc.), andalkylators. PNAs are also included. The nucleic acid may be derivatizedby formation of a methyl or ethyl phosphotriester or an alkylphosphoramidate linkage. Furthermore, the nucleic acid sequences of thepresent invention may also be modified with a label capable of providinga detectable signal, either directly or indirectly. Exemplary labelsinclude radioisotopes, fluorescent molecules, biotin, and the like.

The invention also provides nucleic acid vectors comprising thedisclosed C. albicans-derived sequences or derivatives or fragmentsthereof. A large number of vectors, including plasmid and fungalvectors, have been described for replication and/or expression in avariety of eukaryotic and prokaryotic hosts, and may be used for cloningor protein expression.

The encoded C. albicans polypeptides may be expressed by using manyknown vectors, such as pUC plasmids, pET plasmids (Novagen, Inc.,Madison, Wis.), or pRSET or pREP (Invitrogen, San Diego, Calif.), andmany appropriate host cells, using methods disclosed or cited herein orotherwise known to those skilled in the relevant art. The particularchoice of vector/host is not critical to the practice of the invention.

Recombinant cloning vectors will often include one or more replicationsystems for cloning or expression, one or more markers for selection inthe host, e.g. antibiotic resistance, and one or more expressioncassettes. The inserted C. albicans coding sequences may be synthesizedby standard methods, isolated from natural sources, or prepared ashybrids, etc. Ligation of the C. albicans coding sequences totranscriptional regulatory elements and/or to other amino acid codingsequences may be achieved by known methods. Suitable host cells may betransformed/transfected/infected as appropriate by any suitable methodincluding electroporation, CaCl₂ mediated DNA uptake, fungal infection,microinjection, microprojectile, or other established methods.

Appropriate host cells include bacteria, archebacteria, fungi,especially yeast, and plant and animal cells, especially mammaliancells. Of particular interest are C. albicans, E. coli, B. Subtilis,Saccharomyces cerevisiae, Saccharomyces carlsbergensis,Schizosaccharomyces pombi, SF9 cells, C129 cells, 293 cells, Neurospora,and CHO cells, COS cells, HeLa cells, and immortalized mammalian myeloidand lymphoid cell lines. Preferred replication systems include M13,ColE1, SV40, baculovirus, lambda, adenovirus, and the like. A largenumber of transcription initiation and termination regulatory regionshave been isolated and shown to be effective in the transcription andtranslation of heterologous proteins in the various hosts. Examples ofthese regions, methods of isolation, manner of manipulation, etc. areknown in the art. Under appropriate expression conditions, host cellscan be used as a source of recombinantly produced C. albicans-derivedpeptides and polypeptides.

Advantageously, vectors may also include a transcription regulatoryelement (i.e., a promoter) operably linked to the C. albicans portion.The promoter may optionally contain operator portions and/or ribosomebinding sites. Non-limiting examples of fungal promoters compatible withE. coli include: b-lactamase (penicillinase) promoter; lactose promoter;tryptophan (trp) promoter; araBAD (arabinose) operon promoter;lambda-derived P₁ promoter and N gene ribosome binding site; and thehybrid tac promoter derived from sequences of the trp and lac UV5promoters. Non-limiting examples of yeast promoters include3-phosphoglycerate kinase promoter, glyceraldehyde-3-phosphatedehydrogenase (GAPDH) promoter, galactokinase (GAL1) promoter,galactoepimerase promoter, and alcohol dehydrogenase (ADH) promoter.Suitable promoters for mammalian cells include without limitation viralpromoters such as that from Simian Virus 40 (SV40), Rous sarcoma virus(RSV), adenovirus (ADV), and bovine papilloma virus (BPV). Mammaliancells may also require terminator sequences, polyA addition sequencesand enhancer sequences to increase expression. Sequences which causeamplification of the gene may also be desirable. Furthermore, sequencesthat facilitate secretion of the recombinant product from cells,including, but not limited to, bacteria, yeast, and animal cells, suchas secretory signal sequences and/or prohormone pro region sequences,may also be included. These sequences are well described in the art.

Nucleic acids encoding wild-type or variant C. albicans-derivedpolypeptides may also be introduced into cells by recombination events.For example, such a sequence can be introduced into a cell, and therebyeffect homologous recombination at the site of an endogenous gene or asequence with substantial identity to the gene. Otherrecombination-based methods such as nonhomologous recombinations ordeletion of endogenous genes by homologous recombination may also beused.

The nucleic acids of the present invention find use as templates for therecombinant production of C. albicans-derived peptides or polypeptides.

Identification and Use of C. albicans Nucleic Acid Sequences

The disclosed C. albicans polypeptide and nucleic acid sequences, orother sequences that are contained within ORFs, including completeprotein-coding sequences, of which any of the disclosed C.albicans-specific sequences forms a part, are useful as targetcomponents for diagnosis and/or treatment of C. albicans-causedinfection

It will be understood that the sequence of an entire protein-codingsequence of which each disclosed nucleic acid sequence forms a part canbe isolated and identified based on each disclosed sequence. This can beachieved, for example, by using an isolated nucleic acid encoding thedisclosed sequence, or fragments thereof, to prime a sequencing reactionwith genomic C. albicans DNA as template; this is followed by sequencingthe amplified product. The isolated nucleic acid encoding the disclosedsequence, or fragments thereof, can also be hybridized to C. albicansgenomic libraries to identify clones containing additional completesegments of the protein-coding sequence of which the shorter sequenceforms a part. Then, the entire protein-coding sequence, or fragmentsthereof, or nucleic acids encoding all or part of the sequence, orsequence-conservative or function-conservative variants thereof, may beemployed in practicing the present invention.

Preferred sequences are those that are useful in diagnostic and/ortherapeutic applications. Diagnostic applications include withoutlimitation nucleic-acid-based and antibody-based methods for detectingfungal infection. Therapeutic applications include without limitationvaccines, passive immunotherapy, and drug treatments directed againstgene products that are both unique to fungi and essential for growthand/or replication of fungi.

Identification of Nucleic Acids Encoding Vaccine Components and Targetsfor Agents Effective Against C. albicans

The disclosed C. albicans genome sequence includes segments that directthe synthesis of ribonucleic acids and polypeptides, as well as originsof replication, promoters, other types of regulatory sequences, andintergenic nucleic acids. The invention encompasses nucleic acidsencoding immunogenic components of vaccines and targets for agentseffective against C. albicans. Identification of said immunogeniccomponents involved in the determination of the function of thedisclosed sequences, which can be achieved using a variety ofapproaches. Non-limiting examples of these approaches are describedbriefly below.

Homology to Known Sequences:

Computer-assisted comparison of the disclosed C. albicans sequences withpreviously reported sequences present in publicly available databases isuseful for identifying functional C. albicans nucleic acid andpolypeptide sequences. It will be understood that protein-codingsequences, for example, may be compared as a whole, and that a highdegree of sequence homology between two proteins (such as, forexample, >80-90%) at the amino acid level indicates that the twoproteins also possess some degree of functional homology, such as, forexample, among enzymes involved in metabolism, DNA synthesis, or cellwall synthesis, and proteins involved in transport, cell division, etc.In addition, many structural features of particular protein classes havebeen identified and correlate with specific consensus sequences, suchas, for example, binding domains for nucleotides, DNA, metal ions, andother small molecules; sites for covalent modifications such asphosphorylation, acylation, and the like; sites of protein:proteininteractions, etc. These consensus sequences may be quite short and thusmay represent only a fraction of the entire protein-coding sequence.Identification of such a feature in a C. albicans sequence is thereforeuseful in determining the function of the encoded protein andidentifying useful targets of antifungal drugs.

Of particular relevance to the present invention are structural featuresthat are common to secretory, transmembrane, and surface proteins,including secretion signal peptides and hydrophobic transmembranedomains. C. albicans proteins identified as containing putative signalsequences and/or transmembrane domains are useful as immunogeniccomponents of vaccines.

Targets for therapeutic drugs according to the invention include, butare not limited to, polypeptides of the invention, whether unique to C.albicans or not, that are essential for growth and/or viability of C.albicans under at least one growth condition. Polypeptides essential forgrowth and/or viability can be determined by examining the effect ofdeleting and/or disrupting the genes, i.e., by so-called gene“knockout”. Alternatively, genetic footprinting can be used (Smith etal., 1995, Proc. Natl. Acad. Sci. USA 92:5479-6433; PublishedInternational Application WO 94/26933; U.S. Pat. No. 5,612,180). Stillother methods for assessing essentiality includes the ability to isolateconditional lethal mutations in the specific gene (e.g., temperaturesensitive mutations). Other useful targets for therapeutic drugs, whichinclude polypeptides that are not essential for growth or viability perse but lead to loss of viability of the cell, can be used to targettherapeutic agents to cells.

Strain-Specific Sequences:

Because of the evolutionary relationship between different C. albicansstrains, it is believed that the presently disclosed C. albicanssequences are useful for identifying, and/or discriminating between,previously known and new C. albicans strains. It is believed that otherC. albicans strains will exhibit at least about 70% sequence homologywith the presently disclosed sequence. Systematic and routine analysesof DNA sequences derived from samples containing C. albicans strains,and comparison with the present sequence allows for the identificationof sequences that can be used to discriminate between strains, as wellas those that are common to all C. albicans strains. In one embodiment,the invention provides nucleic acids, including probes, and peptide andpolypeptide sequences that discriminate between different strains of C.albicans. Strain-specific components can also be identified functionallyby their ability to elicit or react with antibodies that selectivelyrecognize one or more C. albicans strains.

In another embodiment, the invention provides nucleic acids, includingprobes, and peptide and polypeptide sequences that are common to all C.albicans strains but are not found in other fungal species.

C. albicans Polypeptides

This invention encompasses isolated C. albicans polypeptides encoded bythe disclosed C. albicans genomic sequences, including the polypeptidesof the invention contained in the Sequence Listing. Polypeptides of theinvention are preferably at least about 5 amino acid residues in length.Using the DNA sequence information provided herein, the amino acidsequences of the polypeptides encompassed by the invention can bededuced using methods well-known in the art. It will be understood thatthe sequence of an entire nucleic acid encoding a C. albicanspolypeptide can be isolated and identified based on an ORF that encodesonly a fragment of the cognate protein-coding region. This can beachieved, for example, by using the isolated nucleic acid encoding theORF, or fragments thereof, to prime a polymerase chain reaction withgenomic C. albicans DNA as template; this is followed by sequencing theamplified product.

The polypeptides of the present invention, includingfunction-conservative variants of the disclosed ORFs, may be isolatedfrom wild-type or mutant C. albicans cells, or from heterologousorganisms or cells (including, but not limited to, bacteria, fungi,insect, plant, and mammalian cells) including C. albicans into which aC. albicans-derived protein-coding sequence has been introduced andexpressed. Furthermore, the polypeptides may be part of recombinantfusion proteins.

C. albicans polypeptides of the invention can be chemically synthesizedusing commercially automated procedures such as those referenced herein,including, without limitation, exclusive solid phase synthesis, partialsolid phase methods, fragment condensation or classical solutionsynthesis. The polypeptides are preferably prepared by solid phasepeptide synthesis as described by Merrifield, 1963, J. Am. Chem. Soc.85:2149. The synthesis is carried out with amino acids that areprotected at the alpha-amino terminus. Trifunctional amino acids withlabile side-chains are also protected with suitable groups to preventundesired chemical reactions from occurring during the assembly of thepolypeptides. The alpha-amino protecting group is selectively removed toallow subsequent reaction to take place at the amino-terminus. Theconditions for the removal of the alpha-amino protecting group do notremove the side-chain protecting groups.

Methods for polypeptide purification are well-known in the art,including, without limitation, preparative disc-gel electrophoresis,isoelectric focusing, HPLC, reversed-phase HPLC, gel filtration, ionexchange and partition chromatography, and countercurrent distribution.For some purposes, it is preferable to produce the polypeptide in arecombinant system in which the C. albicans protein contains anadditional sequence tag that facilitates purification, such as, but notlimited to, a polyhistidine sequence. The polypeptide can then bepurified from a crude lysate of the host cell by chromatography on anappropriate solid-phase matrix. Alternatively, antibodies producedagainst a C. albicans protein or against peptides derived therefrom canbe used as purification reagents. Other purification methods arepossible.

The present invention also encompasses derivatives and homologues of C.albicans-encoded polypeptides. For some purposes, nucleic acid sequencesencoding the peptides may be altered by substitutions, additions, ordeletions that provide for functionally equivalent molecules, i.e.,function-conservative variants. For example, one or more amino acidresidues within the sequence can be substituted by another amino acid ofsimilar properties, such as, for example, positively charged amino acids(arginine, lysine, and histidine); negatively charged amino acids(aspartate and glutamate); polar neutral amino acids; and non-polaramino acids.

The isolated polypeptides may be modified by, for example,phosphorylation, sulfation, acylation, or other protein modifications.They may also be modified with a label capable of providing a detectablesignal, either directly or indirectly, including, but not limited to,radioisotopes and fluorescent compounds.

To identify C. albicans-derived polypeptides for use in the presentinvention, essentially the complete genomic sequence of a C. albicansisolate was analyzed. While, in very rare instances, a nucleic acidsequencing error may be revealed, resolving a rare sequencing error iswell within the art, and such an occurrence will not prevent one skilledin the art from practicing the invention.

Also encompassed are any C. albicans polypeptide sequences that arecontained within the open reading frames (ORFs), including completeprotein-coding sequences, of which any of SEQ ID NO: 1-SEQ ID NO: 14103forms a part. Table 2, which is appended herewith and which forms partof the present specification, provides a putative identification of theparticular function of a polypeptide which is encoded by each ORF basedon the homology match (determined by the BLAST algorithm) of thepredicted polypeptide with known proteins encoded by ORFs in otherorganisms. As a result, one skilled in the art can use the polypeptidesof the present invention for commercial and industrial purposesconsistent with the type of putative identification of the polypeptide.

The present invention provides a library of C. albicans-derivedpolypeptide sequences, and a corresponding library of nucleic acidsequences encoding the polypeptides, wherein the polypeptidesthemselves, or polypeptides contained within ORFs of which they form apart, comprise sequences that are contemplated for use as components ofvaccines. Non-limiting examples of such sequences are listed by SEQ IDNO in Table 2, which is appended herewith and which forms part of thepresent specification.

The present invention also provides a library of C. albicans-derivedpolypeptide sequences, and a corresponding library of nucleic acidsequences encoding the polypeptides, wherein the polypeptidesthemselves, or polypeptides contained within ORFs of which they form apart, comprise sequences lacking homology to any known prokaryotic oreukaryotic sequences. Such libraries provide probes, primers, andmarkers which can be used to diagnose C. albicans infection, includinguse as markers in epidemiological studies. Non-limiting examples of suchsequences are listed by SEQ ID NO in Table 2, which is appended

The present invention also provides a library of C. albicans-derivedpolypeptide sequences, and a corresponding library of nucleic acidsequences encoding the polypeptides, wherein the polypeptidesthemselves, or polypeptides contained within ORFs of which they form apart, comprise targets for therapeutic drugs.

Specific Example: Determination of Candidate Protein Antigens ForAntibody And Vaccine Development

The selection of candidate protein antigens for vaccine development canbe derived from the nucleic acids encoding C. albicans polypeptides.First, the ORF's can be analyzed for homology to other known exported ormembrane proteins and analyzed using the discriminant analysis describedby Klein, et al. (Klein, P., Kanehsia, M., and DeLisi, C. (1985)Biochimica et Biophysica1 Acta 815, 468-476) for predicting exported andmembrane proteins.

Homology searches can be performed using the BLAST algorithm containedin the Wisconsin Sequence Analysis Package (Genetics Computer Group,University Research Park, 575 Science Drive, Madison, Wis. 53711) tocompare each predicted ORF amino acid sequence with all sequences foundin the current GenBank, SWISS-PROT and PIR databases. BLAST searches forlocal alignments between the ORF and the databank sequences and reportsa probability score which indicates the probability of finding thissequence by chance in the database. ORF's with significant homology(e.g. probabilities lower than 1×10⁻⁶ that the homology is only due torandom chance) to membrane or exported proteins represent proteinantigens for vaccine development. Possible functions can be provided toC. albicans genes based on sequence homology to genes cloned in otherorganisms.

Discriminant analysis (Klein, et al. supra) can be used to examine theORF amino acid sequences. This algorithm uses the intrinsic informationcontained in the ORF amino acid sequence and compares it to informationderived from the properties of known membrane and exported proteins.This comparison predicts which proteins will be exported, membraneassociated or cytoplasmic. ORF amino acid sequences identified asexported or membrane associated by this algorithm are likely proteinantigens for vaccine development.

Production of Fragments and Analogs of C. albicans Nucleic Acids andPolypeptides

Based on the discovery of the C. albicans gene products of the inventionprovided in the Sequence Listing, one skilled in the art can alter thedisclosed structure of C. albicans genes, e.g., by producing fragmentsor analogs, and test the newly produced structures for activity.Examples of techniques known to those skilled in the relevant art whichallow the production and testing of fragments and analogs are discussedbelow. These, or analogous methods can be used to make and screenlibraries of polypeptides, e.g., libraries of random peptides orlibraries of fragments or analogs of cellular proteins for the abilityto bind C. albicans polypeptides. Such screens are useful for theidentification of inhibitors of C. albicans.

Generation of Fragments

Fragments of a protein can be produced in several ways, e.g.,recombinantly, by proteolytic digestion, or by chemical synthesis.Internal or terminal fragments of a polypeptide can be generated byremoving one or more nucleotides from one end (for a terminal fragment)or both ends (for an internal fragment) of a nucleic acid which encodesthe polypeptide. Expression of the mutagenized DNA produces polypeptidefragments. Digestion with “end-nibbling” endonucleases can thus generateDNAs which encode an array of fragments. DNAs which encode fragments ofa protein can also be generated by random shearing, restrictiondigestion or a combination of the above-discussed methods.

Fragments can also be chemically synthesized using techniques known inthe art such as conventional Merrifield solid phase f-Moc or t-Bocchemistry. For example, peptides of the present invention may bearbitrarily divided into fragments of desired length with no overlap ofthe fragments, or divided into overlapping fragments of a desiredlength.

Alteration of Nucleic Acids and Polypeptides: Random Methods

Amino acid sequence variants of a protein can be prepared by randommutagenesis of DNA which encodes a protein or a particular domain orregion of a protein. Useful methods include PCR mutagenesis andsaturation mutagenesis. A library of random amino acid sequence variantscan also be generated by the synthesis of a set of degenerateoligonucleotide sequences. (Methods for screening proteins in a libraryof variants are elsewhere herein).

PCR Mutagenesis

In PCR mutagenesis, reduced Taq polymerase fidelity is used to introducerandom mutations into a cloned fragment of DNA (Leung et al., 1989,Technique 1:11-15). The DNA region to be mutagenized is amplified usingthe polymerase chain reaction (PCR) under conditions that reduce thefidelity of DNA synthesis by Taq DNA polymerase, e.g., by using adGTP/dATP ratio of five and adding Mn²⁺ to the PCR reaction. The pool ofamplified DNA fragments are inserted into appropriate cloning vectors toprovide random mutant libraries.

Saturation Mutagenesis

Saturation mutagenesis allows for the rapid introduction of a largenumber of single base substitutions into cloned DNA fragments (Mayers etal., 1985, Science 229:242). This technique includes generation ofmutations, e.g., by chemical treatment or irradiation of single-strandedDNA in vitro, and synthesis of a complimentary DNA strand. The mutationfrequency can be modulated by modulating the severity of the treatment,and essentially all possible base substitutions can be obtained. Becausethis procedure does not involve a genetic selection for mutant fragmentsboth neutral substitutions, as well as those that alter function, areobtained. The distribution of point mutations is not biased towardconserved sequence elements.

Degenerate Oligonucleotides

A library of homologs can also be generated from a set of degenerateoligonucleotide sequences. Chemical synthesis of a degenerate sequencescan be carried out in an automatic DNA synthesizer, and the syntheticgenes then ligated into an appropriate expression vector. The synthesisof degenerate oligonucleotides is known in the art (see for example,Narang, S A (1983) Tetrahedron 39:3; Itakura et al. (1981) RecombinantDNA, Proc 3rd Cleveland Sympos. Macromolecules, ed. A G Walton,Amsterdam: Elsevier pp 273-289; Itakura et al. (1984) Annu. Rev.Biochem. 53:323; Itakura et al. (1984) Science 198:1056; Ike et al.(1983) Nucleic Acid Res. 11:477. Such techniques have been employed inthe directed evolution of other proteins (see, for example, Scott et al.(1990) Science 249:386-390; Roberts et al. (1992) PNAS 89:2429-2433;Devlin et al. (1990) Science 249: 404-406; Cwirla et al. (1990) PNAS 87:6378-6382; as well as U.S. Pat. Nos. 5,223,409, 5,198,346, and5,096,815).

Alteration of Nucleic Acids and Polypeptides: Methods for DirectedMutagenesis

Non-random or directed, mutagenesis techniques can be used to providespecific sequences or mutations in specific regions. These techniquescan be used to create variants which include, e.g., deletions,insertions, or substitutions, of residues of the known amino acidsequence of a protein. The sites for mutation can be modifiedindividually or in series, e.g., by (1) substituting first withconserved amino acids and then with more radical choices depending uponresults achieved, (2) deleting the target residue, or (3) insertingresidues of the same or a different class adjacent to the located site,or combinations of options 1-3.

Alanine Scanning Mutagenesis

Alanine scanning mutagenesis is a useful method for identification ofcertain residues or regions of the desired protein that are preferredlocations or domains for mutagenesis, Cunningham and Wells (Science244:1081-1085, 1989). In alanine scanning, a residue or group of targetresidues are identified (e.g., charged residues such as Arg, Asp, His,Lys, and Glu) and replaced by a neutral or negatively charged amino acid(most preferably alanine or polyalanine). Replacement of an amino acidcan affect the interaction of the amino acids with the surroundingaqueous environment in or outside the cell. Those domains demonstratingfunctional sensitivity to the substitutions are then refined byintroducing further or other variants at or for the sites ofsubstitution. Thus, while the site for introducing an amino acidsequence variation is predetermined, the nature of the mutation per seneed not be predetermined. For example, to optimize the performance of amutation at a given site, alanine scanning or random mutagenesis may beconducted at the target codon or region and the expressed desiredprotein subunit variants are screened for the optimal combination ofdesired activity.

Oligonucleotide-Mediated Mutagenesis

Oligonucleotide-mediated mutagenesis is a useful method for preparingsubstitution, deletion, and insertion variants of DNA, see, e.g.,Adelman et al., (DNA 2:183, 1983). Briefly, the desired DNA is alteredby hybridizing an oligonucleotide encoding a mutation to a DNA template,where the template is the single-stranded form of a plasmid orbacteriophage containing the unaltered or native DNA sequence of thedesired protein. After hybridization, a DNA polymerase is used tosynthesize an entire second complementary strand of the template thatwill thus incorporate the oligonucleotide primer, and will code for theselected alteration in the desired protein DNA. Generally,oligonucleotides of at least about 25 nucleotides in length are used. Anoptimal oligonucleotide will have 12 to 15 nucleotides that arecompletely complementary to the template on either side of thenucleotide(s) coding for the mutation. This ensures that theoligonucleotide will hybridize properly to the single-stranded DNAtemplate molecule. The oligonucleotides are readily synthesized usingtechniques known in the art such as that described by Crea et al. (Proc.Natl. Acad. Sci. USA, 75: 5765[1978]).

Cassette Mutagenesis

Another method for preparing variants, cassette mutagenesis, is based onthe technique described by Wells et al. (Gene, 34:315[1985]). Thestarting material is a plasmid (or other vector) which includes theprotein subunit DNA to be mutated. The codon(s) in the protein subunitDNA to be mutated are identified. There must be a unique restrictionendonuclease site on each side of the identified mutation site(s). If nosuch restriction sites exist, they may be generated using theabove-described oligonucleotide-mediated mutagenesis method to introducethem at appropriate locations in the desired protein subunit DNA. Afterthe restriction sites have been introduced into the plasmid, the plasmidis cut at these sites to linearize it. A double-stranded oligonucleotideencoding the sequence of the DNA between the restriction sites butcontaining the desired mutation(s) is synthesized using standardprocedures. The two strands are synthesized separately and thenhybridized together using standard techniques. This double-strandedoligonucleotide is referred to as the cassette. This cassette isdesigned to have 3′ and 5′ ends that are comparable with the ends of thelinearized plasmid, such that it can be directly ligated to the plasmid.This plasmid now contains the mutated desired protein subunit DNAsequence.

Combinatorial Mutagenesis

Combinatorial mutagenesis can also be used to generate mutants (Ladneret al., WO 88/06630). In this method, the amino acid sequences for agroup of homologs or other related proteins are aligned, preferably topromote the highest homology possible. All of the amino acids whichappear at a given position of the aligned sequences can be selected tocreate a degenerate set of combinatorial sequences. The variegatedlibrary of variants is generated by combinatorial mutagenesis at thenucleic acid level, and is encoded by a variegated gene library. Forexample, a mixture of synthetic oligonucleotides can be enzymaticallyligated into gene sequences such that the degenerate set of potentialsequences are expressible as individual peptides, or alternatively, as aset of larger fusion proteins containing the set of degeneratesequences.

Other Modifications of C. albicans Nucleic Acids and Polypeptides

It is possible to modify the structure of a C. albicans polypeptide forsuch purposes as increasing solubility, enhancing stability (e.g., shelflife ex vivo and resistance to proteolytic degradation in vivo). Amodified C. albicans protein or peptide can be produced in which theamino acid sequence has been altered, such as by amino acidsubstitution, deletion, or addition as described herein.

An C. albicans peptide can also be modified by substitution of cysteineresidues preferably with alanine, serine, threonine, leucine or glutamicacid residues to minimize dimerization via disulfide linkages. Inaddition, amino acid side chains of fragments of the protein of theinvention can be chemically modified. Another modification iscyclization of the peptide.

In order to enhance stability and/or reactivity, a C. albicanspolypeptide can be modified to incorporate one or more polymorphisms inthe amino acid sequence of the protein resulting from any naturalallelic variation. Additionally, D-amino acids, non-natural amino acids,or non-amino acid analogs can be substituted or added to produce amodified protein within the scope of this invention. Furthermore, a C.albicans polypeptide can be modified using polyethylene glycol (PEG)according to the method of A. Sehon and co-workers (Wie et al., supra)to produce a protein conjugated with PEG. In addition, PEG can be addedduring chemical synthesis of the protein. Other modifications of C.albicans proteins include reduction/alkylation (Tarr, Methods of ProteinMicrocharacterization, J. E. Silver ed., Humana Press, Clifton N.J.155-194 (1986)); acylation (Tarr, supra); chemical coupling to anappropriate carrier (Mishell and Shiigi, eds, Selected Methods inCellular Immunology, WH Freeman, San Francisco, Calif. (1980), U.S. Pat.No. 4,939,239; or mild formalin treatment (Marsh, (1971) Int. Arch. ofAllergy and Appl. Immunol., 41: 199-215).

To facilitate purification and potentially increase solubility of a C.albicans protein or peptide, it is possible to add an amino acid fusionmoiety to the peptide backbone. For example, hexa-histidine can be addedto the protein for purification by immobilized metal ion affinitychromatography (Hochuli, E. et al., (1988) Bio/Technology, 6:1321-1325). In addition, to facilitate isolation of peptides free ofirrelevant sequences, specific endoprotease cleavage sites can beintroduced between the sequences of the fusion moiety and the peptide.

To potentially aid proper antigen processing of epitopes within a C.albicans polypeptide, canonical protease sensitive sites can beengineered between regions, each comprising at least one epitope viarecombinant or synthetic methods. For example, charged amino acid pairs,such as KK or RR, can be introduced between regions within a protein orfragment during recombinant construction thereof. The resulting peptidecan be rendered sensitive to cleavage by cathepsin and/or othertrypsin-like enzymes which would generate portions of the proteincontaining one or more epitopes. In addition, such charged amino acidresidues can result in an increase in the solubility of the peptide.

Primary Methods for Screening Polypeptides and Analogs

Various techniques are known in the art for screening generated mutantgene products. Techniques for screening large gene libraries ofteninclude cloning the gene library into replicable expression vectors,transforming appropriate cells with the resulting library of vectors,and expressing the genes under conditions in which detection of adesired activity, e.g., in this case, binding to C. albicans polypeptideor an interacting protein, facilitates relatively easy isolation of thevector encoding the gene whose product was detected. Each of thetechniques described below is amenable to high through-put analysis forscreening large numbers of sequences created, e.g., by randommutagenesis techniques.

Two Hybrid Systems

Two hybrid assays such as the system described below (as with the otherscreening methods described herein), can be used to identifypolypeptides, e.g., fragments or analogs of a naturally-occurring C.albicans polypeptide, e.g., of cellular proteins, or of randomlygenerated polypeptides which bind to a C. albicans protein. (The C.albicans domain is used as the bait protein and the library of variantsare expressed as prey fusion proteins.) In an analogous fashion, a twohybrid assay (as with the other screening methods described herein), canbe used to find polypeptides which bind a C. albicans polypeptide.

Display Libraries

In one approach to screening assays, the candidate peptides aredisplayed on the surface of a cell or viral particle, and the ability ofparticular cells or viral particles to bind an appropriate receptorprotein via the displayed product is detected in a “panning assay”. Forexample, the gene library can be cloned into the gene for a surfacemembrane protein of a fungal cell, and the resulting fusion proteindetected by panning (Ladner et al., WO 88/06630; Fuchs et al. (1991)Bio/Technology 9:1370-1371; and Goward et al. (1992) TIBS 18:136-140).In a similar fashion, a detectably labeled ligand can be used to scorefor potentially functional peptide homologs. Fluorescently labeledligands, e.g., receptors, can be used to detect homologs which retainligand-binding activity. The use of fluorescently labeled ligands,allows cells to be visually inspected and separated under a fluorescencemicroscope, or, where the morphology of the cell permits, to beseparated by a fluorescence-activated cell sorter.

A gene library can be expressed as a fusion protein on the surface of aviral particle. For instance, in the filamentous phage system, foreignpeptide sequences can be expressed on the surface of infectious phage,thereby conferring two significant benefits. First, since these phagecan be applied to affinity matrices at concentrations well over 10¹³phage per milliliter, a large number of phage can be screened at onetime. Second, since each infectious phage displays a gene product on itssurface, if a particular phage is recovered from an affinity matrix inlow yield, the phage can be amplified by another round of infection. Thegroup of almost identical E. coli filamentous phages, M13, fd., and f1,are most often used in phage display libraries. Either of the phage gIIIor gVIII coat proteins can be used to generate fusion proteins withoutdisrupting the ultimate packaging of the viral particle. Foreignepitopes can be expressed at the NH₂-terminal end of pIII and phagebearing such epitopes recovered from a large excess of phage lackingthis epitope (Ladner et al. PCT publication WO 90/02909; Garrard et al.,PCT publication WO 92/09690; Marks et al. (1992) J. Biol. Chem.267:16007-16010; Griffiths et al. (1993) EMBO J 12:725-734; Clackson etal. (1991) Nature 352:624-628; and Barbas et al. (1992) PNAS89:4457-4461).

A common approach uses the maltose receptor of E. coli (the outermembrane protein, LamB) as a peptide fusion partner (Charbit et al.(1986) EMBO 5, 3029-3037). Oligonucleotides have been inserted intoplasmids encoding the LamB gene to produce peptides fused into one ofthe extracellular loops of the protein. These peptides are available forbinding to ligands, e.g., to antibodies, and can elicit an immuneresponse when the cells are administered to animals. Other cell surfaceproteins, e.g., OmpA (Schorr et al. (1991) Vaccines 91, pp. 387-392),PhoE (Agterberg, et al. (1990) Gene 88, 37-45), and PAL (Fuchs et al.(1991) Bio/Tech 9, 1369-1372), as well as large bacterial surfacestructures have served as vehicles for peptide display. Peptides can befused to pilin, a protein which polymerizes to form the pilus-a conduitfor interbacterial exchange of genetic information (Thiry et al. (1989)Appl. Environ. Microbiol. 55, 984-993). Because of its role ininteracting with other cells, the pilus provides a useful support forthe presentation of peptides to the extracellular environment. Anotherlarge surface structure used for peptide display is the bacterial motiveorgan, the flagellum. Fusion of peptides to the subunit proteinflagellin offers a dense array of many peptide copies on the host cells(Kuwajima et al. (1988) Bio/Tech. 6, 1080-1083). Surface proteins ofother bacterial species have also served as peptide fusion partners.Examples include the Staphylococcus protein A and the outer membrane IgAprotease of Neisseria (Hansson et al. (1992) J. Bacteriol. 174,4239-4245 and Klauser et al. (1990) EMBO J. 9, 1991-1999).

In the filamentous phage systems and the LamB system described above,the physical link between the peptide and its encoding DNA occurs by thecontainment of the DNA within a particle (cell or phage) that carriesthe peptide on its surface. Capturing the peptide captures the particleand the DNA within. An alternative scheme uses the DNA-binding proteinLacI to form a link between peptide and DNA (Cull et al. (1992) PNAS USA89:1865-1869). This system uses a plasmid containing the LacI gene withan oligonucleotide cloning site at its 3′-end. Under the controlledinduction by arabinose, a LacI-peptide fusion protein is produced. Thisfusion retains the natural ability of LacI to bind to a short DNAsequence known as LacO operator (LacO). By installing two copies of LacOon the expression plasmid, the LacI-peptide fusion binds tightly to theplasmid that encoded it. Because the plasmids in each cell contain onlya single oligonucleotide sequence and each cell expresses only a singlepeptide sequence, the peptides become specifically and stablelyassociated with the DNA sequence that directed its synthesis. The cellsof the library are gently lysed and the peptide-DNA complexes areexposed to a matrix of immobilized receptor to recover the complexescontaining active peptides. The associated plasmid DNA is thenreintroduced into cells for amplification and DNA sequencing todetermine the identity of the peptide ligands. As a demonstration of thepractical utility of the method, a large random library ofdodecapeptides was made and selected on a monoclonal antibody raisedagainst the opioid peptide dynorphin B. A cohort of peptides wasrecovered, all related by a consensus sequence corresponding to asix-residue portion of dynorphin B. (Cull et al. (1992) Proc. Natl.Acad. Sci. U.S.A. 89-1869)

This scheme, sometimes referred to as peptides-on-plasmids, differs intwo important ways from the phage display methods. First, the peptidesare attached to the C-terminus of the fusion protein, resulting in thedisplay of the library members as peptides having free carboxy termini.Both of the filamentous phage coat proteins, pIII and pVIII, areanchored to the phage through their C-termini, and the guest peptidesare placed into the outward-extending N-terminal domains. In somedesigns, the phage-displayed peptides are presented right at the aminoterminus of the fusion protein. (Cwirla, et al. (1990) Proc. Natl. Acad.Sci. U.S.A. 87, 6378-6382) A second difference is the set of biologicalbiases affecting the population of peptides actually present in thelibraries. The LacI fusion molecules are confined to the cytoplasm ofthe host cells. The phage coat fusions are exposed briefly to thecytoplasm during translation but are rapidly secreted through the innermembrane into the periplasmic compartment, remaining anchored in themembrane by their C-terminal hydrophobic domains, with the N-termini,containing the peptides, protruding into the periplasm while awaitingassembly into phage particles. The peptides in the LacI and phagelibraries may differ significantly as a result of their exposure todifferent proteolytic activities. The phage coat proteins requiretransport across the inner membrane and signal peptidase processing as aprelude to incorporation into phage. Certain peptides exert adeleterious effect on these processes and are underrepresented in thelibraries (Gallop et al. (1994) J. Med. Chem. 37(9):1233-1251). Theseparticular biases are not a factor in the LacI display system.

The number of small peptides available in recombinant random librariesis enormous. Libraries of 10⁷-10⁹ independent clones are routinelyprepared. Libraries as large as 10¹¹ recombinants have been created, butthis size approaches the practical limit for clone libraries. Thislimitation in library size occurs at the step of transforming the DNAcontaining randomized segments into the host bacterial cells. Tocircumvent this limitation, an in vitro system based on the display ofnascent peptides in polysome complexes has recently been developed. Thisdisplay library method has the potential of producing libraries 3-6orders of magnitude larger than the currently available phage/phagemidor plasmid libraries. Furthermore, the construction of the libraries,expression of the peptides, and screening, is done in an entirelycell-free format.

In one application of this method (Gallop et al. (1994) J. Med. Chem.37(9):1233-1251), a molecular DNA library encoding 10¹² decapeptides wasconstructed and the library expressed in an E. coli S30 in vitro coupledtranscription/translation system. Conditions were chosen to stall theribosomes on the mRNA, causing the accumulation of a substantialproportion of the RNA in polysomes and yielding complexes containingnascent peptides still linked to their encoding RNA. The polysomes aresufficiently robust to be affinity purified on immobilized receptors inmuch the same way as the more conventional recombinant peptide displaylibraries are screened. RNA from the bound complexes is recovered,converted to cDNA, and amplified by PCR to produce a template for thenext round of synthesis and screening. The polysome display method canbe coupled to the phage display system. Following several rounds ofscreening, cDNA from the enriched pool of polysomes was cloned into aphagemid vector. This vector serves as both a peptide expression vector,displaying peptides fused to the coat proteins, and as a DNA sequencingvector for peptide identification. By expressing the polysome-derivedpeptides on phage, one can either continue the affinity selectionprocedure in this format or assay the peptides on individual clones forbinding activity in a phage ELISA, or for binding specificity in acompletion phage ELISA (Barret, et al. (1992) Anal. Biochem 204,357-364). To identify the sequences of the active peptides one sequencesthe DNA produced by the phagemid host.

Secondary Screening of Polypeptides and Analogs

The high through-put assays described above can be followed by secondaryscreens in order to identify further biological activities which will,e.g., allow one skilled in the art to differentiate agonists fromantagonists. The type of a secondary screen used will depend on thedesired activity that needs to be tested. For example, an assay can bedeveloped in which the ability to inhibit an interaction between aprotein of interest and its respective ligand can be used to identifyantagonists from a group of peptide fragments isolated though one of theprimary screens described above.

Therefore, methods for generating fragments and analogs and testing themfor activity are known in the art. Once the core sequence of interest isidentified, it is routine for one skilled in the art to obtain analogsand fragments.

Peptide Mimetics of C. albicans Polypeptides

The invention also provides for reduction of the protein binding domainsof the subject C. albicans polypeptides to generate mimetics, e.g.peptide or non-peptide agents. The peptide mimetics are able to disruptbinding of a polypeptide to its counter ligand, e.g., in the case of aC. albicans polypeptide binding to a naturally occurring ligand. Thecritical residues of a subject C. albicans polypeptide which areinvolved in molecular recognition of a polypeptide can be determined andused to generate C. albicans-derived peptidomimetics which competitivelyor noncompetitively inhibit binding of the C. albicans polypeptide withan interacting polypeptide (see, for example, European patentapplications EP-412,762A and EP-B31,080A).

For example, scanning mutagenesis can be used to map the amino acidresidues of a particular C. albicans polypeptide involved in binding aninteracting polypeptide, peptidomimetic compounds (e.g. diazepine orisoquinoline derivatives) can be generated which mimic those residues inbinding to an interacting polypeptide, and which therefore can inhibitbinding of a C. albicans polypeptide to an interacting polypeptide andthereby interfere with the function of C. albicans polypeptide. Forinstance, non-hydrolyzable peptide analogs of such residues can begenerated using benzodiazepine (e.g., see Freidinger et al. in Peptides:Chemistry and Biology, G. R. Marshall ed., ESCOM Publisher: Leiden,Netherlands, 1988), azepine (e.g., see Huffman et al. in Peptides:Chemistry and Biology, G. R. Marshall ed., ESCOM Publisher: Leiden,Netherlands, 1988), substituted gama lactam rings (Garvey et al. inPeptides: Chemistry and Biology, G. R. Marshall ed., ESCOM Publisher:Leiden, Netherlands, 1988), keto-methylene pseudopeptides (Ewenson etal. (1986) J Med Chem 29:295; and Ewenson et al. in Peptides: Structureand Function (Proceedings of the 9th American Peptide Symposium) PierceChemical Co. Rockland, Ill., 1985), b-turn dipeptide cores (Nagai et al.(1985) Tetrahedron Lett 26:647; and Sato et al. (1986) J Chem Soc PerkinTrans 1:1231), and b-aminoalcohols (Gordon et al. (1985) Biochem BiophysRes Commun 126:419; and et al. (1986) Biochem Biophys Res Commun134:71).

Vaccine Formulations for C. albicans Nucleic Acids and Polypeptides

This invention also features vaccine compositions for protection againstinfection by C. albicans or for treatment of C. albicans infection. Inone embodiment, the vaccine compositions contain one or more immunogeniccomponents such as a surface protein from C. albicans, or portionthereof, and a pharmaceutically acceptable carrier. Nucleic acids withinthe scope of the invention are exemplified by the nucleic acids of theinvention contained in the Sequence Listing which encode C. albicanssurface proteins. Any nucleic acid encoding an immunogenic C. albicansprotein, or portion thereof, which is capable of expression in a cell,can be used in the present invention. These vaccines have therapeuticand prophylactic utilities.

One aspect of the invention provides a vaccine composition forprotection against infection by C. albicans which contains at least oneimmunogenic fragment of a C. albicans protein and a pharmaceuticallyacceptable carrier. Preferred fragments include peptides of at leastabout 10 amino acid residues in length, preferably about 10-20 aminoacid residues in length, and more preferably about 12-16 amino acidresidues in length.

Immunogenic components of the invention can be obtained, for example, byscreening polypeptides recombinantly produced from the correspondingfragment of the nucleic acid encoding the full-length C. albicansprotein. In addition, fragments can be chemically synthesized usingtechniques known in the art such as conventional Merrifield solid phasef-Moc or t-Boc chemistry.

In one embodiment, immunogenic components are identified by the abilityof the peptide to stimulate T cells. Peptides which stimulate T cells,as determined by, for example, T cell proliferation or cytokinesecretion are defined herein as comprising at least one T cell epitope.T cell epitopes are believed to be involved in initiation andperpetuation of the immune response to the protein allergen which isresponsible for the clinical symptoms of allergy. These T cell epitopesare thought to trigger early events at the level of the T helper cell bybinding to an appropriate HLA molecule on the surface of an antigenpresenting cell, thereby stimulating the T cell subpopulation with therelevant T cell receptor for the epitope. These events lead to T cellproliferation, lymphokine secretion, local inflammatory reactions,recruitment of additional immune cells to the site of antigen/T cellinteraction, and activation of the B cell cascade, leading to theproduction of antibodies. A T cell epitope is the basic element, orsmallest unit of recognition by a T cell receptor, where the epitopecomprises amino acids essential to receptor recognition (e.g.,approximately 6 or 7 amino acid residues). Amino acid sequences whichmimic those of the T cell epitopes are within the scope of thisinvention.

Screening immunogenic components can be accomplished using one or moreof several different assays. For example, in vitro, peptide T cellstimulatory activity is assayed by contacting a peptide known orsuspected of being immunogenic with an antigen presenting cell whichpresents appropriate MHC molecules in a T cell culture. Presentation ofan immunogenic C. albicans peptide in association with appropriate MHCmolecules to T cells in conjunction with the necessary co-stimulationhas the effect of transmitting a signal to the T cell that induces theproduction of increased levels of cytokines, particularly ofinterleukin-2 and interleukin-4. The culture supernatant can be obtainedand assayed for interleukin-2 or other known cytokines. For example, anyone of several conventional assays for interleukin-2 can be employed,such as the assay described in Proc. Natl. Acad. Sci USA, 86: 1333(1989) the pertinent portions of which are incorporated herein byreference. A kit for an assay for the production of interferon is alsoavailable from Genzyme Corporation (Cambridge, Mass.).

Alternatively, a common assay for T cell proliferation entails measuringtritiated thymidine incorporation. The proliferation of T cells can bemeasured in vitro by determining the amount of ³H-labeled thymidineincorporated into the replicating DNA of cultured cells. Therefore, therate of DNA synthesis and, in turn, the rate of cell division can bequantified.

Vaccine compositions of the invention containing immunogenic components(e.g., C. albicans polypeptide or fragment thereof or nucleic acidencoding a C. albicans polypeptide or fragment thereof) preferablyinclude a pharmaceutically acceptable carrier. The term“pharmaceutically acceptable carrier” refers to a carrier that does notcause an allergic reaction or other untoward effect in patients to whomit is administered. Suitable pharmaceutically acceptable carriersinclude, for example, one or more of water, saline, phosphate bufferedsaline, dextrose, glycerol, ethanol and the like, as well ascombinations thereof. Pharmaceutically acceptable carriers may furthercomprise minor amounts of auxiliary substances such as wetting oremulsifying agents, preservatives or buffers, which enhance the shelflife or effectiveness of the antibody. For vaccines of the inventioncontaining C. albicans polypeptides, the polypeptide is co-administeredwith a suitable adjuvant.

It will be apparent to those of skill in the art that thetherapeutically effective amount of DNA or protein of this inventionwill depend, inter alia, upon the administration schedule, the unit doseof antibody administered, whether the protein or DNA is administered incombination with other therapeutic agents, the immune status and healthof the patient, and the therapeutic activity of the particular proteinor DNA.

Vaccine compositions are conventionally administered parenterally, e.g.,by injection, either subcutaneously or intramuscularly. Methods forintramuscular immunization are described by Wolff et al. (1990) Science247: 1465-1468 and by Sedegah et al. (1994) Immunology 91: 9866-9870.Other modes of administration include oral and pulmonary formulations,suppositories, and transdermal applications. Oral immunization ispreferred over parenteral methods for inducing protection againstinfection by C. albicans. Cain et. al. (1993) Vaccine 11: 637-642. Oralformulations include such normally employed excipients as, for example,pharmaceutical grades of mannitol, lactose, starch, magnesium stearate,sodium saccharine, cellulose, magnesium carbonate, and the like.

The vaccine compositions of the invention can include an adjuvant,including, but not limited to aluminum hydroxide;N-acetyl-muramyl-L-threonyl-D-isoglutamine (thr-MDP);N-acetyl-nor-muramyl-L-alanyl-D-isoglutamine (CGP 11637, referred to asnor-MDP);N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-(1′-2′-dipalmitoyl-sn-glycero-3-hydroxyphos-phoryloxy)-ethylamine(CGP 19835A, referred to a MTP-PE); RIBI, which contains threecomponents from bacteria; monophosphoryl lipid A; trehalose dimycoloate;cell wall skeleton (MPL+TDM+CWS) in a 2% squalene/Tween 80 emulsion; andcholera toxin. Others which may be used are non-toxic derivatives ofcholera toxin, including its B subunit, and/or conjugates or geneticallyengineered fusions of the C. albicans polypeptide with cholera toxin orits B subunit, procholeragenoid, fungal polysaccharides, includingschizophyllan, muramyl dipeptide, muramyl dipeptide derivatives, phorbolesters, labile toxin of E. coli, non-C. albicans fungal lysates, blockpolymers or saponins.

Other suitable delivery methods include biodegradable microcapsules orimmuno-stimulating complexes (ISCOMs), cochleates, or liposomes,genetically engineered attenuated live vectors such as viruses orbacteria, and recombinant (chimeric) virus-like particles, e.g.,bluetongue. The amount of adjuvant employed will depend on the type ofadjuvant used. For example, when the mucosal adjuvant is cholera toxin,it is suitably used in an amount of 5 mg to 50 mg, for example 10 mg to35 mg. When used in the form of microcapsules, the amount used willdepend on the amount employed in the matrix of the microcapsule toachieve the desired dosage. The determination of this amount is withinthe skill of a person of ordinary skill in the art.

Carrier systems in humans may include enteric release capsulesprotecting the antigen from the acidic environment of the stomach, andincluding C. albicans polypeptide in an insoluble form as fusionproteins. Suitable carriers for the vaccines of the invention areenteric coated capsules and polylactide-glycolide microspheres. Suitablediluents are 0.2 N NaHCO₃ and/or saline.

Vaccines of the invention can be administered as a primary prophylacticagent in adults or in children, as a secondary prevention, aftersuccessful eradication of C. albicans in an infected host, or as atherapeutic agent in the aim to induce an immune response in asusceptible host to prevent infection by C. albicans. The vaccines ofthe invention are administered in amounts readily determined by personsof ordinary skill in the art. Thus, for adults a suitable dosage will bein the range of 10 mg to 10 g, preferably 10 mg to 100 mg. A suitabledosage for adults will also be in the range of 5 mg to 500 mg. Similardosage ranges will be applicable for children. Those skilled in the artwill recognize that the optimal dose may be more or less depending uponthe patient's body weight, disease, the route of administration, andother factors. Those skilled in the art will also recognize thatappropriate dosage levels can be obtained based on results with knownoral vaccines such as, for example, a vaccine based on an E. coli lysate(6 mg dose daily up to total of 540 mg) and with an enterotoxigenic E.coli purified antigen (4 doses of 1 mg) (Schulman et al., J. Urol.150:917-921 (1993); Boedecker et al., American GastroenterologicalAssoc. 999:A-222 (1993)). The number of doses will depend upon thedisease, the formulation, and efficacy data from clinical trials.Without intending any limitation as to the course of treatment, thetreatment can be administered over 3 to 8 doses for a primaryimmunization schedule over 1 month (Boedeker, AmericanGastroenterological Assoc. 888:A-222 (1993)).

In a preferred embodiment, a vaccine composition of the invention can bebased on a killed whole E. coli preparation with an immunogenic fragmentof a C. albicans protein of the invention expressed on its surface or itcan be based on an E. coli lysate, wherein the killed E. coli acts as acarrier or an adjuvant.

It will be apparent to those skilled in the art that some of the vaccinecompositions of the invention are useful only for preventing C. albicansinfection, some are useful only for treating C. albicans infection, andsome are useful for both preventing and treating C. albicans infection.In a preferred embodiment, the vaccine composition of the inventionprovides protection against C. albicans infection by stimulating humoraland/or cell-mediated immunity against C. albicans. It should beunderstood that amelioration of any of the symptoms of C. albicansinfection is a desirable clinical goal, including a lessening of thedosage of medication used to treat C. albicans-caused disease, or anincrease in the production of antibodies in the serum or mucous ofpatients.

Antibodies Reactive with C. albicans Polypeptides

The invention also includes antibodies specifically reactive with thesubject C. albicans polypeptide. Anti-protein/anti-peptide antisera ormonoclonal antibodies can be made by standard protocols (See, forexample, Antibodies: A Laboratory Manual ed. by Harlow and Lane (ColdSpring Harbor Press: 1988)). A mammal such as a mouse, a hamster orrabbit can be immunized with an immunogenic form of the peptide.Techniques for conferring immunogenicity on a protein or peptide includeconjugation to carriers or other techniques well known in the art. Animmunogenic portion of the subject C. albicans polypeptide can beadministered in the presence of adjuvant. The progress of immunizationcan be monitored by detection of antibody titers in plasma or serum.Standard ELISA or other immunoassays can be used with the immunogen asantigen to assess the levels of antibodies.

In a preferred embodiment, the subject antibodies are immunospecific forantigenic determinants of the C. albicans polypeptides of the invention,e.g. antigenic determinants of a polypeptide of the invention containedin the Sequence Listing, or a closely related human or non-humanmammalian homolog (e.g., 90% homologous, more preferably at least about95% homologous). In yet a further preferred embodiment of the invention,the anti-C. albicans antibodies do not substantially cross react (i.e.,react specifically) with a protein which is for example, less than 80%percent homologous to a sequence of the invention contained in theSequence Listing. By “not substantially cross react”, it is meant thatthe antibody has a binding affinity for a non-homologous protein whichis less than 10 percent, more preferably less than 5 percent, and evenmore preferably less than 1 percent, of the binding affinity for aprotein of the invention contained in the Sequence Listing. In a mostpreferred embodiment, there is no cross-reactivity between fungal andmammalian antigens.

The term antibody as used herein is intended to include fragmentsthereof which are also specifically reactive with C. albicanspolypeptides. Antibodies can be fragmented using conventional techniquesand the fragments screened for utility in the same manner as describedabove for whole antibodies. For example, F(ab′)₂ fragments can begenerated by treating antibody with pepsin. The resulting F(ab′)₂fragment can be treated to reduce disulfide bridges to produce Fab′fragments. The antibody of the invention is further intended to includebispecific and chimeric molecules having an anti-C. albicans portion.

Both monoclonal and polyclonal antibodies (Ab) directed against C.albicans polypeptides or C. albicans polypeptide variants, and antibodyfragments such as Fab′ and F(ab′)₂, can be used to block the action ofC. albicans polypeptide and allow the study of the role of a particularC. albicans polypeptide of the invention in aberrant or unwantedintracellular signaling, as well as the normal cellular function of theC. albicans and by microinjection of anti-C. albicans polypeptideantibodies of the present invention.

Antibodies which specifically bind C. albicans epitopes can also be usedin immunohistochemical staining of tissue samples in order to evaluatethe abundance and pattern of expression of C. albicans antigens. Anti-C.albicans polypeptide antibodies can be used diagnostically inimmuno-precipitation and immuno-blotting to detect and evaluate C.albicans levels in tissue or bodily fluid as part of a clinical testingprocedure. Likewise, the ability to monitor C. albicans polypeptidelevels in an individual can allow determination of the efficacy of agiven treatment regimen for an individual afflicted with such adisorder. The level of a C. albicans polypeptide can be measured incells found in bodily fluid, such as in urine samples or can be measuredin tissue, such as produced by gastric biopsy. Diagnostic assays usinganti-C. albicans antibodies can include, for example, immunoassaysdesigned to aid in early diagnosis of C. albicans infections. Thepresent invention can also be used as a method of detecting antibodiescontained in samples from individuals infected by this bacterium usingspecific C. albicans antigens.

Another application of anti-C. albicans polypeptide antibodies of theinvention is in the immunological screening of cDNA librariesconstructed in expression vectors such as λgt11, λgt18-23, λZAP, andλORF8. Messenger libraries of this type, having coding sequencesinserted in the correct reading frame and orientation, can producefusion proteins. For instance, λgt11 will produce fusion proteins whoseamino termini consist of β-galactosidase amino acid sequences and whosecarboxy termini consist of a foreign polypeptide. Antigenic epitopes ofa subject C. albicans polypeptide can then be detected with antibodies,as, for example, reacting nitrocellulose filters lifted from infectedplates with anti-C. albicans polypeptide antibodies. Phage, scored bythis assay, can then be isolated from the infected plate. Thus, thepresence of C. albicans gene homologs can be detected and cloned fromother species, and alternate isoforms (including splicing variants) canbe detected and cloned.

Bio Chip Technology

The nucleic acid sequences or fragments thereof of the present inventionlend themselves to the detection of nucleic acid sequences or fragmentsthereof of C. albicans or other species of Candida using nanotechnologyapparatus, compositions and methods, referred to generically herein as“bio chip” technology. Bio chips containing arrays of nucleic acidsequence can also be used to measure expression of genes of C. albicansor other species of Candida. For example, to diagnose a patient with aC. albicans or other Candida infection, a sample from a human or animalcan be used as a probe on a bio chip containing an array of nucleic acidsequence from the present invention. In addition, a sample from adisease state can be compared to a sample from a non-disease state whichwould help identify a gene that is up-regulated or expressed in thedisease state. This would provide valuable insight as to the mechanismby which the disease manifests. Changes in gene expression can also beused to identify critical pathways involved in drug transport ormetabolism, and may enable the identification of novel targets involvedin virulence or host cell interactions involved in maintenance of aninfection. Procedures using such techniques have been described by Brownet al., 1995, Science 270: 467-470.

Bio chip technology can also be used to monitor the genetic changes ofpotential therapeutic compounds including, deletions, insertions ormismatches. Once the therapeutic is added to the patient, changes to thegenetic sequence can be evaluated for its efficacy. In addition, thenucleic acid sequence of the present invention can be used to determineessential genes in cell cycling. As described in Iyer et al., 1999(Science, 283:83-87) genes essential in the cell cycle can be identifiedusing bio chips. Furthermore, the present invention provides nucleicacid sequences which can be used with bio chip technology to understandregulatory networks in bacteria, measure the response to environmentalsignals or drugs as in drug screening, and study virulence induction.(Mons et al., 1998, Nature Biotechnology, 16: 45-48). Patents teachingthis technology include U.S. Pat. Nos. 5,445,934, 5,744,305, and5,800,992.

Kits Containing Nucleic Acids, Polypeptides or Antibodies of theInvention

The nucleic acid, polypeptides and antibodies of the invention can beconveniently combined with other reagents and articles to form kits.Kits for diagnostic purposes typically comprise the nucleic acid,polypeptides or antibodies in vials or other suitable vessels. Kitstypically comprise other reagents for performing hybridizationreactions, polymerase chain reactions (PCR), or for reconstitution oflyophilized components, such as aqueous media, salts, buffers, and thelike. Kits may also comprise reagents for sample processing such asdetergents, chaotropic salts and the like. Kits may also compriseimmobilization means such as particles, supports, wells, dipsticks andthe like. Kits may also comprise labeling means such as dyes, developingreagents, radioisotopes, fluorescent agents, luminescent orchemiluminescent agents, enzymes, intercalating agents and the like.With the nucleic acid and amino acid sequence information providedherein, individuals skilled in art can readily assemble kits to servetheir particular purpose. Kits further can include instructions for use.

Drug Screening Assays Using C. albicans Polypeptides

By making available purified and recombinant C. albicans polypeptides,the present invention provides assays which can be used to screen fordrugs which are either agonists or antagonists of the normal cellularfunction, in this case, of the subject C. albicans polypeptides, or oftheir role in intracellular signaling. Such inhibitors or potentiatorsmay be useful as new therapeutic agents to combat C. albicans infectionsin humans. A variety of assay formats will suffice and, in light of thepresent inventions, will be comprehended by the person skilled in theart.

In many drug screening programs which test libraries of compounds andnatural extracts, high throughput assays are desirable in order tomaximize the number of compounds surveyed in a given period of time.Assays which are performed in cell-free systems, such as may be derivedwith purified or semi-purified proteins, are often preferred as“primary” screens in that they can be generated to permit rapiddevelopment and relatively easy detection of an alteration in amolecular target which is mediated by a test compound. Moreover, theeffects of cellular toxicity and/or bioavailability of the test compoundcan be generally ignored in the in vitro system, the assay instead beingfocused primarily on the effect of the drug on the molecular target asmay be manifest in an alteration of binding affinity with other proteinsor change in enzymatic properties of the molecular target. Accordingly,in an exemplary screening assay of the present invention, the compoundof interest is contacted with an isolated and purified C. albicanspolypeptide.

Screening assays can be constructed in vitro with a purified C. albicanspolypeptide or fragment thereof, such as a C. albicans polypeptidehaving enzymatic activity, such that the activity of the polypeptideproduces a detectable reaction product. The efficacy of the compound canbe assessed by generating dose response curves from data obtained usingvarious concentrations of the test compound. Moreover, a control assaycan also be performed to provide a baseline for comparison. Suitableproducts include those with distinctive absorption, fluorescence, orchemi-luminescence properties, for example, because detection may beeasily automated. A variety of synthetic or naturally occurringcompounds can be tested in the assay to identify those which inhibit orpotentiate the activity of the C. albicans polypeptide. Some of theseactive compounds may directly, or with chemical alterations to promotemembrane permeability or solubility, also inhibit or potentiate the sameactivity (e.g., enzymatic activity) in whole, live C. albicans cells.

Overexpression Assays

Overexpression assays are based on the premise that overproduction of aprotein would lead to a higher level of resistance to compounds thatselectively interfere with the function of that protein. Overexpressionassays may be used to identify compounds that interfere with thefunction of virtually any type of protein, including without limitationenzymes, receptors, DNA- or RNA-binding proteins, or any proteins thatare directly or indirectly involved in regulating cell growth.

Typically, two fungal strains are constructed. One contains a singlecopy of the gene of interest, and a second contains several copies ofthe same gene. Identification of useful inhibitory compounds of thistype of assay is based on a comparison of the activity of a testcompound in inhibiting growth and/or viability of the two strains. Themethod involves constructing a nucleic acid vector that directs highlevel expression of a particular target nucleic acid. The vectors arethen transformed into host cells in single or multiple copies to producestrains that express low to moderate and high levels of protein encodingby the target sequence (strain A and B, respectively). Nucleic acidcomprising sequences encoding the target gene can, of course, bedirectly integrated into the host cell.

Large numbers of compounds (or crude substances which may contain activecompounds) are screened for their effect on the growth of the twostrains. Agents which interfere with an unrelated target equally inhibitthe growth of both strains. Agents which interfere with the function ofthe target at high concentration should inhibit the growth of bothstrains. It should be possible, however, to titrate out the inhibitoryeffect of the compound in the overexpressing strain. That is, if thecompound is affecting the particular target that is being tested, itshould be possible to inhibit the growth of strain A at a concentrationof the compound that allows strain B to grow.

Alternatively, a fungal strain is constructed that contains the gene ofinterest under the control of an inducible promoter. Identification ofuseful inhibitory agents using this type of assay is based on acomparison of the activity of a test compound in inhibiting growthand/or viability of this strain under both inducing and non-inducingconditions. The method involves constructing a nucleic acid vector thatdirects high-level expression of a particular target nucleic acid. Thevector is then transformed into host cells that are grown under bothnon-inducing and inducing conditions (conditions A and B, respectively).

Large numbers of compounds (or crude substances which may contain activecompounds) are screened for their effect on growth under these twoconditions. Agents that interfere with the function of the target shouldinhibit growth under both conditions. It should be possible, however, totitrate out the inhibitory effect of the compound in the overexpressingstrain. That is, if the compound is affecting the particular target thatis being tested, it should be possible to inhibit growth under conditionA at a concentration that allows the strain to grow under condition B.

Ligand-Binding Assays

Many of the targets according to the invention have functions that havenot yet been identified. Ligand-binding assays are useful to identifyinhibitor compounds that interfere with the function of a particulartarget, even when that function is unknown. These assays are designed todetect binding of test compounds to particular targets. The detectionmay involve direct measurement of binding. Alternatively, indirectindications of binding may involve stabilization of protein structure ordisruption of a biological function. Non-limiting examples of usefulligand-binding assays are detailed below.

A useful method for the detection and isolation of binding proteins isthe Biomolecular Interaction Assay (BIAcore) system developed byPharmacia Biosensor and described in the manufacturer's protocol (LKBPharmacia, Sweden). The BIAcore system uses an affinity purifiedanti-GST antibody to immobilize GST-fusion proteins onto a sensor chip.The sensor utilizes surface plasmon resonance which is an opticalphenomenon that detects changes in refractive indices. In accordancewith the practice of the invention, a protein of interest is coated ontoa chip and test compounds are passed over the chip. Binding is detectedby a change in the refractive index (surface plasmon resonance).

A different type of ligand-binding assay involves scintillationproximity assays (SPA, described in U.S. Pat. No. 4,568,649).

Another type of ligand binding assay, also undergoing development, isbased on the fact that proteins containing mitochondrial targetingsignals are imported into isolated mitochondria in vitro (Hurt et al.,1985, Embo J. 4:2061-2068; Eilers and Schatz, Nature, 1986,322:228-231). In a mitochondrial import assay, expression vectors areconstructed in which nucleic acids encoding particular target proteinsare inserted downstream of sequences encoding mitochondrial importsignals. The chimeric proteins are synthesized and tested for theirability to be imported into isolated mitochondria in the absence andpresence of test compounds. A test compound that binds to the targetprotein should inhibit its uptake into isolated mitochondria in vitro.

Another ligand-binding assay is the yeast two-hybrid system (Fields andSong, 1989, Nature 340:245-246). The yeast two-hybrid system takesadvantage of the properties of the GAL4 protein of the yeastSaccharomyces cerevisiae. The GAL4 protein is a transcriptionalactivator required for the expression of genes encoding enzymes ofgalactose utilization. This protein consists of two separable andfunctionally essential domains: an N-terminal domain which binds tospecific DNA sequences (UAS_(G)); and a C-terminal domain containingacidic regions, which is necessary to activate transcription. The nativeGAL4 protein, containing both domains, is a potent activator oftranscription when yeast are grown on galactose media. The N-terminaldomain binds to DNA in a sequence-specific manner but is unable toactivate transcription. The C-terminal domain contains the activatingregions but cannot activate transcription because it fails to belocalized to UAS_(G). In the two-hybrid system, a system of two hybridproteins containing parts of GAL4: (1) a GAL4 DNA-binding domain fusedto a protein ‘X’ and (2) a GAL4 activation region fused to a protein‘Y’. If X and Y can form a protein-protein complex and reconstituteproximity of the GAL4 domains, transcription of a gene regulated byUAS_(G) occurs. Creation of two hybrid proteins, each containing one ofthe interacting proteins X and Y, allows the activation region ofUAS_(G) to be brought to its normal site of action.

The binding assay described in Fodor et al., 1991, Science 251:767-773,which involves testing the binding affinity of test compounds for aplurality of defined polymers synthesized on a solid substrate, may alsobe useful.

Compounds which bind to the polypeptides of the invention arepotentially useful as antifungal agents for use in therapeuticcompositions.

Pharmaceutical formulations suitable for antifungal therapy comprise theantifungal agent in conjunction with one or more biologically acceptablecarriers. Suitable biologically acceptable carriers include, but are notlimited to, phosphate-buffered saline, saline, deionized water, or thelike. Preferred biologically acceptable carriers are physiologically orpharmaceutically acceptable carriers.

The antifungal compositions include an antifungal effective amount ofactive agent. Antifungal effective amounts are those quantities of theantifungal agents of the present invention that afford prophylacticprotection against fungal infections or which result in amelioration orcure of an existing fungal infection. This antifungal effective amountwill depend upon the agent, the location and nature of the infection,and the particular host. The amount can be determined by experimentationknown in the art, such as by establishing a matrix of dosages andfrequencies and comparing a group of experimental units or subjects toeach point in the matrix.

The antifungal active agents or compositions can be formed into dosageunit forms, such as for example, creams, ointments, lotions, powders,liquids, tablets, capsules, suppositories, sprays, aerosols or the like.If the antifungal composition is formulated into a dosage unit form, thedosage unit form may contain an antifungal effective amount of activeagent. Alternatively, the dosage unit form may include less than such anamount if multiple dosage unit forms or multiple dosages are to be usedto administer a total dosage of the active agent. Dosage unit forms caninclude, in addition, one or more excipient(s), diluent(s),disintegrant(s), lubricant(s), plasticizer(s), colorant(s), dosagevehicle(s), absorption enhancer(s), stabilizer(s), bactericide(s), orthe like.

For general information concerning formulations, see, e.g., Gilman etal. (eds.), 1990, Goodman and Gilman's: The Pharmacological Basis ofTherapeutics, 8th ed., Pergamon Press; and Remington's PharmaceuticalSciences, 17th ed., 1990, Mack Publishing Co., Easton, Pa.; Avis et al.(eds.), 1993, Pharmaceutical Dosage Forms: Parenteral Medications,Dekker, New York; Lieberman et al (eds.), 1990, Pharmaceutical DosageForms: Disperse Systems, Dekker, New York.

The antifungal agents and compositions of the present invention areuseful for preventing or treating C. albicans infections. Infectionprevention methods incorporate a prophylactically effective amount of anantifungal agent or composition. A prophylactically effective amount isan amount effective to prevent C. albicans infection and will dependupon the specific fungal strain, the agent, and the host. These amountscan be determined experimentally by methods known in the art and asdescribed above.

C. albicans infection treatment methods incorporate a therapeuticallyeffective amount of an antifungal agent or composition. Atherapeutically effective amount is an amount sufficient to ameliorateor eliminate the infection. The prophylactically and/or therapeuticallyeffective amounts can be administered in one administration or overrepeated administrations. Therapeutic administration can be followed byprophylactic administration, once the initial fungal infection has beenresolved.

The antifungal agents and compositions can be administered topically orsystemically. Topical application is typically achieved byadministration of creams, ointments, lotions, or sprays as describedabove. Systemic administration includes both oral and parental routes.Parental routes include, without limitation, subcutaneous,intramuscular, intraperitoneal, intravenous, transdermal, inhalation andintranasal administration.

Exemplification

Cloning and Sequencing C. albicans Genomic Sequence

This invention provides nucleotide sequences of the genome of C.albicans which thus comprises a DNA sequence library of C. albicansgenomic DNA. The detailed description that follows provides nucleotidesequences of C. albicans, and also describes how the sequences wereobtained and how ORFs (Open Reading Frames) and protein-coding sequencescan be identified. Also described are methods of using the disclosed C.albicans sequences in methods including diagnostic and therapeuticapplications. Furthermore, the library can be used as a database foridentification and comparison of medically important sequences in thisand other strains of C. albicans as well as other species of Candida.

Chromosomal DNA from strain SC5314 of C. albicans was isolated afterZymolyase digestion, sodium dodecyl sulfate lysis, potassium acetateprecipitation, phenol:chloroform extraction and ethanol precipitation(Soll, D. R., T. Srikantha and S. R. Lockhart: CharacterizingDevelopmentally Regulated Genes in C. albicans. In Microbial GenomeMethods. K. W. Adolph, editor. CRC Press. New York. p 17-37.). GenomicC. albicans DNA was hydrodynamically sheared in an HPLC and thenseparated on a standard 1% agarose gel. Fractions corresponding to2500-3000 bp in length were excised from the gel and purifed by theGeneClean procedure (Bio101, Inc.).

The purified DNA fragments were then blunt-ended using T4 DNApolymerase. The healed DNA was then ligated to unique BstXI-linkeradapters (5′-GTCTTCACCACGGGG-3′ and 5′-GTGGTGAAGAC-3′ in 100-1000 foldmolar excess). These linkers are complimentary to the BstXI-cut pGTCvector, while the overhang is not self-complimentary. Therefore, thelinkers will not concatermerize nor will the cut-vector religate itselfeasily. The linker-adapted inserts were separated from theunincorporated linkers on a 1% agarose gel and purified using GeneClean.The linker-adapted inserts were then ligated to BstXI-cut vector toconstruct a “shotgun” sublclone libraries.

Only major modifications to the protocols are highlighted. Briefly, thelibrary was then transformed into DH5á competent cells (Gibco/BRL, DH5átransformation protocol). It was assessed by plating onto antibioticplates containing ampicillin and IPTG/Xgal. The plates were incubatedovernight at 37□C. Transformants were then used for plating of clonesand picking for sequencing. The cultures were grown overnight at 37□C.DNA was purified using a silica bead DNA preparation (Engelstein, 1996)method. In this manner, 25 μg of DNA was obtained per clone.

These purified DNA samples were then sequenced using primarily ABIdye-terminator chemistry. All subsequent steps were based on sequencingby ABI377 automated DNA sequencing methods. The ABI dye terminatorsequence reads were run on ABI377 machines and the data was transferredto UNIX machines following lane tracking of the gels. Base calls andquality scores were determined using the program PHRED (Ewing et al.,1998, Genome Res. 8: 175-185; Ewing and Green, 1998, Genome Res. 8:685-734). Reads were assembled using PHRAP (P. Green, Abstracts of DOEHuman Genome Program Contractor-Grantee Workshop V, January 1996, p.157) with default program parameters and quality scores. The initialassembly was done at 2.3-fold coverage and yielded 5821 contigs.

Finishing could follow the initial assembly. Missing mates (sequencesfrom clones that only gave reads from one end of the Candida DNAinserted in the plasmid) could be identified and sequenced with ABItechnology to allow the identification of additional overlappingcontigs.

End-sequencing of randomly picked genomic lambda was also performed.Sequencing on a both sides was done for all lambda sequences. The lambdalibrary backbone helped to verify the integrity of the assembly andallowed closure of some of the physical gaps. Primers for walking offthe ends of contigs would be selected using pick_primer (a GTC program)near the ends of the clones to facilitate gap closure. These walks couldbe sequenced using the selected clones and primers. These data are thenreassembled with PHRAP. Additional sequencing using PCR-generatedtemplates and screened and/or unscreened lambda templates could be donein addition.

To identify C. albicans polypeptides the complete genomic sequence of C.albicans was analyzed essentially as follows: First, all possiblestop-to-stop open reading frames (ORFs) greater than 180 nucleotides inall six reading frames were translated into amino acid sequences.Second, the identified ORFs were analyzed for homology to known(archeabacter, prokaryotic and eukaryotic) protein sequences. Third, thecoding potential of non-homologous sequences was evaluated with theprogram GENEMARK™ (Borodovsky and Mclninch, 1993, Comp. Chem. 17:123).

Identification, Cloning and Expression of C. albicans Nucleic Acids

Expression and purification of the C. albicans polypeptides of theinvention can be performed essentially as outlined below.

To facilitate the cloning, expression and purification of membrane andsecreted proteins from C. albicans, a gene expression system, such asthe pET System (Novagen), for cloning and expression of recombinantproteins in E. coli, is selected. Also, a DNA sequence encoding apeptide tag, the His-Tag, is fused to the 3′ end of DNA sequences ofinterest in order to facilitate purification of the recombinant proteinproducts. The 3′ end is selected for fusion in order to avoid alterationof any 5′ terminal signal sequence.

PCR Amplification and Cloning of Nucleic Acids Containing ORF's EncodingEnzymes

Nucleic acids chosen (for example, from the nucleic acids set forth inSEQ ID NO: 1-SEQ ID NO: 14103) for cloning from strain SC5314 of C.albicans are prepared for amplification cloning by polymerase chainreaction (PCR). Synthetic oligonucleotide primers specific for the 5′and 3′ ends of open reading frames (ORFs) are designed and purchasedfrom GibcoBRL Life Technologies (Gaithersburg, Md., USA). All forwardprimers (specific for the 5′ end of the sequence) are designed toinclude an NcoI cloning site at the extreme 5′ terminus. These primersare designed to permit initiation of protein translation at a methionineresidue followed by a valine residue and the coding sequence for theremainder of the native C. albicans DNA sequence. All reverse primers(specific for the 3′ end of any C. albicans ORF) include a EcoRI site atthe extreme 5′ terminus to permit cloning of each C. albicans sequenceinto the reading frame of the pET-28b. The pET-28b vector providessequence encoding an additional 20 carboxy-terminal amino acidsincluding six histidine residues (at the extreme C-terminus), whichcomprise the His-Tag.

Genomic DNA prepared from strain SC5314 of C. albicans is used as thesource of template DNA for PCR amplification reactions (CurrentProtocols in Molecular Biology, John Wiley and Sons, Inc., F. Ausubel etal., eds., 1994). To amplify a DNA sequence containing an C. albicansORF, genomic DNA (50 nanograms) is introduced into a reaction vialcontaining 2 mM MgCl₂, 1 micromolar synthetic oligonucleotide primers(forward and reverse primers) complementary to and flanking a defined C.albicans ORF, 0.2 mM of each deoxynucleotide triphosphate; dATP, dGTP,dCTP, dTTP and 2.5 units of heat stable DNA polymerase (Amplitaq, RocheMolecular Systems, Inc., Branchburg, N.J., USA) in a final volume of 100microliters.

Upon completion of thermal cycling reactions, each sample of amplifiedDNA is washed and purified using the Qiaquick Spin PCR purification kit(Qiagen, Gaithersburg, Md., USA). All amplified DNA samples aresubjected to digestion with the restriction endonucleases, e.g., NcoIand EcoRI (New England BioLabs, Beverly, Mass., USA) (Current Protocolsin Molecular Biology, John Wiley and Sons, Inc., F. Ausubel et al.,eds., 1994). DNA samples are then subjected to electrophoresis on 1.0%NuSeive (FMC BioProducts, Rockland, Me. USA) agarose gels. DNA isvisualized by exposure to ethidium bromide and long wave uv irradiation.DNA contained in slices isolated from the agarose gel is purified usingthe Bio 101 GeneClean Kit protocol (Bio 101 Vista, Calif., USA).

Cloning of C. albicans Nucleic Acids into an Expression Vector

The pET-28b vector is prepared for cloning by digestion with restrictionendonucleases, e.g., NcoI and EcoRI (Current Protocols in MolecularBiology, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994). ThepET-28a vector, which encodes a His-Tag that can be fused to the 5′ endof an inserted gene, is prepared by digestion with appropriaterestriction endonucleases.

Following digestion, DNA inserts are cloned (Current Protocols inMolecular Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds.,1994) into the previously digested pET-28b expression vector. Productsof the ligation reaction are then used to transform the BL21 strain ofE. coli (Current Protocols in Molecular Biology, John Wiley and Sons,Inc., F. Ausubel et al., eds., 1994) as described below.

Transformation of Competent Bacteria with Recombinant Plasmids

Competent bacteria, E coli strain BL21 or E. coli strain BL21 (DE3), aretransformed with recombinant pET expression plasmids carrying the clonedC. albicans sequences according to standard methods (Current Protocolsin Molecular, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994).Briefly, 1 microliter of ligation reaction is mixed with 50 microlitersof electrocompetent cells and subjected to a high voltage pulse, afterwhich, samples are incubated in 0.45 milliliters SOC medium (0.5% yeastextract, 2.0% tryptone, 10 mM NaCl, 2.5 mM KCl, 10 mM MgCl2, 10 mM MgSO4and 20, mM glucose) at 37□C with shaking for 1 hour. Samples are thenspread on LB agar plates containing 25 microgram/ml kanamycin sulfatefor growth overnight. Transformed colonies of BL21 are then picked andanalyzed to evaluate cloned inserts as described below.

Identification of Recombinant Expression Vectors with C. albicansNucleic Acids

Individual BL21 clones transformed with recombinant pET-28b C. albicansORFs are analyzed by PCR amplification of the cloned inserts using thesame forward and reverse primers, specific for each C. albicanssequence, that were used in the original PCR amplification cloningreactions. Successful amplification verifies the integration of the C.albicans sequences in the expression vector (Current Protocols inMolecular Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds.,1994).

Isolation and Preparation of Nucleic Acids From Transformants

Individual clones of recombinant pET-28b vectors carrying properlycloned C. albicans ORFs are picked and incubated in 5 mls of LB brothplus 25 microgram/ml kanamycin sulfate overnight. The following dayplasmid DNA is isolated and purified using the Qiagen plasmidpurification protocol (Qiagen Inc., Chatsworth, Calif., USA).

Expression of Recombinant C. albicans Sequences in E. coli

The pET vector can be propagated in any E. coli K-12 strain e.g. HMS174,HB101, JM109, DH5, etc. for the purpose of cloning or plasmidpreparation. Hosts for expression include E. coli strains containing achromosomal copy of the gene for T7 RNA polymerase. These hosts arelysogens of bacteriophage DE3, a lambda derivative that carries the lacIgene, the lacUV5 promoter and the gene for T7 RNA polymerase. T7 RNApolymerase is induced by addition of isopropyl-B-D-thiogalactoside(IPTG), and the T7 RNA polymerase transcribes any target plasmid, suchas pET-28b, carrying its gene of interest. Strains used include:BL21(DE3) (Studier, F. W., Rosenberg, A. H., Dunn, J. J., andDubendorff, J. W. (1990) Meth. Enzymol. 185, 60-89).

To express recombinant C. albicans sequences, 50 nanograms of plasmidDNA isolated as described above is used to transform competent BL21(DE3) bacteria as described above (provided by Novagen as part of thepET expression system kit). The lacZ gene (beta-galactosidase) isexpressed in the pET-System as described for the C. albicans recombinantconstructions. Transformed cells are cultured in SOC medium for 1 hour,and the culture is then plated on LB plates containing 25 micrograms/mlkanamycin sulfate. The following day, fungal colonies are pooled andgrown in LB medium containing kanamycin sulfate (25 micrograms/ml) to anoptical density at 600 nM of 0.5 to 1.0 O.D. units, at which point, 1millimolar IPTG was added to the culture for 3 hours to induce geneexpression of the C. albicans recombinant DNA constructions.

After induction of gene expression with IPTG, bacteria are pelleted bycentrifugation in a Sorvall RC-3B centrifuge at 3500×g for 15 minutes at4° C. Pellets are resuspended in 50 milliliters of cold 10 mM Tris-HCl,pH 8.0, 0.1 M NaCl and 0.1 mM EDTA (STE buffer). Cells are thencentrifuged at 2000×g for 20 min at 4° C. Wet pellets are weighed andfrozen at −80° C. until ready for protein purification.

A variety of methodologies known in the art can be utilized to purifythe isolated proteins. (Current Protocols in Protein Science, John Wileyand Sons, Inc., J. E. Coligan et al., eds., 1995). For example, thefrozen cells are thawed, resupended in buffer and ruptured by severalpassages through a small volume microfluidizer (Model M-110S,Microfluidics International Corporation, Newton, Mass.). The resultanthomogenate is centrifuged to yield a clear supernatant (crude extract)and following filtration the crude extract is fractionated over columns.Fractions are monitored by absorbance at OD₂₈₀ nm. and peak fractionsmay analyzed by SDS-PAGE.

The concentrations of purified protein preparations are quantifiedspectrophotometrically using absorbance coefficients calculated fromamino acid content (Perkins, S. J. 1986 Eur. J. Biochem. 157, 169-180).Protein concentrations are also measured by the method of Bradford, M.M. (1976) Anal. Biochem. 72, 248-254, and Lowry, O. H., Rosebrough, N.,Farr, A. L. & Randall, R. J. (1951) J. Biol. Chem. 193, pages 265-275,using bovine serum albumin as a standard.

SDS-polyacrylamide gels of various concentrations are purchased fromBioRad (Hercules, Calif., USA), and stained with Coomassie blue.Molecular weight markers may include rabbit skeletal muscle myosin (200kDa), E. coli β-galactosidase (116 kDa), rabbit muscle phosphorylase B(97.4 kDa), bovine serum albumin (66.2 kDa), ovalbumin (45 kDa), bovinecarbonic anhydrase (31 kDa), soybean trypsin inhibitor (21.5 kDa), eggwhite lysozyme (14.4 kDa) and bovine aprotinin (6.5 kDa).

EQUIVALENTS

Those skilled in the art will recognize, or be able to ascertain usingno more than routine experimentation, many equivalents to the specificembodiments and methods described herein. The specific embodimentsdescribed herein are offered by way of example only, and the inventionis to limited only by the terms of the appended claims, along with thefull scope of equivalents to which such claims are entitled. LENGTHYTABLE REFERENCED HERE US20070027309A1-20070201-T00001 Please refer tothe end of the specification for access instructions. LENGTHY TABLE Thepatent application contains a lengthy table section. A copy of the tableis available in electronic form from the USPTO web site(http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US20070027309A1)An electronic copy of the table will also be available from the USPTOupon request and payment of the fee set forth in 37 CFR 1.19(b)(3).

1. An isolated nucleic acid comprising a nucleotide sequence encoding anC. albicans polypeptide selected from the group consisting of SEQ ID NO:14104-SEQ ID NO:
 28206. 2. A recombinant expression vector comprisingthe nucleic acid of claim 1 operably linked to a transcriptionregulatory element.
 3. A cell comprising a recombinant expression vectorof claim
 2. 4. A method for producing an C. albicans polypeptidecomprising culturing a cell of claim 3 under conditions that permitexpression of the polypeptide.
 5. An isolated nucleic acid comprising anucleotide sequence encoding an C. albicans polypeptide or a fragmentthereof, said nucleic acid selected from the group consisting of SEQ IDNO: 1-SEQ ID NO:
 14103. 6. A recombinant expression vector comprisingthe nucleic acid of claim 5 operably linked to a transcriptionregulatory element.
 7. A cell comprising a recombinant expression vectorof claim
 6. 8. A method for producing an C. albicans polypeptidecomprising culturing a cell of claim 7 under conditions that permitexpression of the polypeptide.
 9. A probe comprising a nucleotidesequence having at least 8 consecutive nucleotides of a nucleotidesequence selected from the group consisting of SEQ ID NO: 1-SEQ ID NO:14103.
 10. An isolated nucleic acid comprising a nucleotide sequence ofat least 8 nucleotides in length, wherein the sequence is hybridizableto a nucleic acid having a nucleotide sequence selected from the groupconsisting of SEQ ID NO: 1-SEQ ID NO:
 14103. 11. A vaccine compositionfor prevention or treatment of an C. albicans infection comprising aneffective amount of a nucleic acid of claim 5 and a pharmaceuticallyacceptable carrier.
 12. A vaccine composition of claim 11, furthercomprising an adjuvant.
 13. A vaccine composition of claim 11, furthercomprising one or more additional active ingredients.
 14. A method oftreating a subject for C. albicans infection comprising administering toa subject a vaccine composition of claim 11, 12 or 13, such thattreatment of C. albicans infection occurs.
 15. A method of claim 14,wherein the treatment is a prophylactic treatment.
 16. A method of claim14, wherein the treatment is a therapeutic treatment.
 17. A recombinantor substantially pure preparation of an C. albicans polypeptide or afragment thereof, wherein said polypeptide is selected from the groupconsisting of SEQ ID NO: 14104-SEQ ID NO:
 28206. 18. A vaccinecomposition for prevention or treatment of an C. albicans infectioncomprising an effective amount of an C. albicans polypeptide of claim 17and a pharmaceutically acceptable carrier.
 19. A vaccine composition ofclaim 18, further comprising an adjuvant.
 20. A vaccine composition ofclaim 18, further comprising one or more additional active ingredients.21. A method of treating a subject for C. albicans infection comprisingadministering to a subject a vaccine composition of claim 18, 19 or 20,such that treatment of C. albicans infection occurs.
 22. A method ofclaim 21, wherein the treatment is a prophylactic treatment.
 23. Amethod of claim 21, wherein the treatment is a therapeutic treatment.24. A method for detecting the presence of a Candida nucleic acid in asample comprising: (a) contacting a sample with a nucleic acid of claim5 under conditions in which a hybrid can form between the probe and aCandida nucleic acid in the sample; and (b) detecting the hybrid formedin step (a), wherein detection of a hybrid indicates the presence of aCandida nucleic acid in the sample.
 25. A computer readable mediumhaving recorded thereon the nucleotide sequences depicted in SEQ ID NO:1-SEQ ID NO: 14103 or fragments thereof.
 26. A computer based system foridentifying fragments of the Candida genome of commercial importancecomprising the following elements; a) a data storage means comprisingthe nucleotide sequences SEQ ID NO: 1-SEQ ID NO: 14103 or fragmentsthereof, b) a search means for comparing a target sequence to thenucleotide sequences of the data storage means of step (a) to identifyhomologous sequences, and; c) a retrieval means for obtaining saidhomologous sequences(s) of step (b).
 27. A method of identifyingcommercially important nucleic acid fragments of the Candida genomecomprising the step of comparing a database comprising the nucleotidesequences SEQ ID NO: 1-SEQ ID NO: 14103 or fragments thereof with atarget sequence to obtain a nucleic acid molecule comprised of acomplementary nucleotide sequence to said target sequence, wherein saidtarget sequence is not randomly selected.
 28. A method for identifyingan expression modulating fragment of the Candida genome comprising thestep of comparing a database comprising the nucleotide sequences SEQ IDNO: 1-SEQ ID NO: 14103 or fragments thereof with a target sequence toobtain a nucleic acid molecule comprised of a complementary nucleotidesequence to said target sequence, wherein said target sequence comprisessequences known to regulate gene expression.